00CQ82MX BlSOfiB 



2D 156 719 

AOTHOB 
TITLE 

ZNSIIIOTION 

^SPONS AG EH CI 

PDB DATE 
COHTBACT 
HOTE 

AVAILABLE FBOM 



EDBS PfilCE 
DESCBIPXdRS 



95 



an 007 286k 



IDEHTIFIEBS 



House, Ernest B. . / ' 

The logic of Evaluative Argutent, esi Mcicgraph' - 
Series in Evaluation* 7. > 

California 'Oniv., los Angelesl Center for the Study , 
of Evaluation. 

National Inst, of Edticaticn (DHE.B) , Bashington, ^ 
B.C. ' ' • ^ . ' ^ ' 

77 . . 

«00-77-003*l 

69p. ' ■ \ 
Center for the Study of E^valiiation, DCLA Gra^uato 
School of Education, ^05 Hilgard Avenue^ los Anseles, 
Califcjrnia 90024 ($4. 5C) , ' . 

HF-$a.83 HC-$3.50 Plus Postage. 

Abstract Eeasoning; Audiencfes; Bias; Ca^e Studies; 
Credibility; Data Analysis;' Decision liaicing; 
♦Evaluation; Ev^uation Hethods; *Evaluatioc. Seeds; 
■Evaluative Thinking; E valuators; Logic; *Lpc(ical 
Thinking; Matheaatical Bedels; *Models; *Eersuasive 
Discourse; Probiea Solving.; Besponsitility; Suaiative 
Evaluation; *Values ' . ' 

Glass (Gene V) ; Scriven (flichael) 



ABSTEACT . ' , ! 

Evaluation is an act cf persuasion directed to a 
specific audience concerning the solution; of a probiei. lie process 
of evaluation is prescribed by the natur^ of kncsledge — iilich is 
generally coaplex,- always uncertain (in varying degrees) , and not 
always propositi(6hal--and by the nature o"f .logic, which is always 
selective. In the process of persiiasicn cn^ lust ascertain who the 
audience is and find a basis of agreeaent on preaises, both of facts 
and values, and on presuaiptions. T,«p criteria for evaluation are: tTie 
aost efficient way to a given end, or the lost ef f ective . use of 
available reso.ur'ces. Quantitative evaluat'ion Methods involve three 
stages: (1) substantive definition of the\ prcblee audits translation 
into a foraal, aatheaatical iaodel; (2) coapilaticn cf infcraaticn in 
teras of the foraal model and its foraal, ^logical analysis; and <3) 
translation of the foraal conclusions fcackvinto substantive teres. 
Both foraulation and interpretation r€quire\gocd intuitive judgaent. 
The evaluator and the audience aust eaploy their reasoning in a 
dialogue, and both .aust assuae responsibility, since ovaleatipn is 
nevfer coapletely convi».cing nor entirely aftitrary. The logical 
arguments used in two works are discussed. The works — Gene Glass* 
review of Michael Scriven* s instructional cassette lecture on 
"Evaluation Skills;" and Scriven's reply—are appended. 
(Author/CTH) 



* Be productions supplied by EDBS are the best that can te aade ♦ 

* froa the original document' * 



"*J?EPARTMENTOFMEALTM 

'NATIONAL, NSmi^E OP 
, EDUCATION 



"PERMISSION T(J REPRODUCE THIS 
[\ MATERIAL HAS BE^N GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) AND 
USERS OF THE ER|C ^YSTEM ' 
1 ~ 



CSE 

MONOGRAPH 
SERIES 

IN . . ^ ■ . 
EVALUATION 



CO 



^1 
t— {j 

CD 
UJ 




THE LOGIC OF 
EVALUATIVE ARGUMENT 



Ernest R. House 



i 



Center for the Study of Evaluation 
university of california • . los angeles 

ERIC 



CSE MONOGRAPH SERIES 
IN EVALUAtldN 



5SERIES EDiTOR 
Eva L. Baker 



Centcr for the Study of Evaluation 
UCLA Graduate School of Education 
University of California, Los Angeles 
Los Angeles, California 90024 



{ 



CSE MONOGRAPH SEREES IN EVALUATION 




NUMBER 

1. Domain-Referenced Curriculuir^valuation: A Technical Handbook 

and a Case from the MINNEMAST Project 

Wells Hively, Graham Maxwell, George Rabehl, Donald Sension, 
and Stephen Lundin S3.50 

2. N'ational Priorities for Elementary Education 

Ralph-Hoepfner, Paul A. Bradley, and William J. Doherty $3.50 

3. Problems in Criterion-Referenced Measurement 

Chester W. Harris. Marvin C. Alkin, and W. James Popham 
(Editors) ^ ^ 53.50 

4. Evaluation and Decision Making: The Title VII Experience 

Mamn C. Alkin, Jacqueline Kosecoff, Carol Fitz-Gibbon, 
• • and Richard Seligman $3.50 

5. Evaluation Study of the California Preschool Program 

Ralph Hoepfner and Arlene Fink ^ $3.50 

6. Achievement Test Items-rMethods of ^Study 

Chester W. Harris, Andrea Pearlman, and Rand R. W''cox 514.50 

7. The Logic of Evaluative Argument 

Ernest R. House . S4.50 



\ 



ERLC 



This project has been funded ai least in part with Federal funds from 
the Department of Health. Education, and Welfare under contract 
number N!E400-77»0034. The conlcntsof this publication do not ncces- 
sarily reflect the views or.polides of the Departtpent of Health. Edu- 
cation, and Welfare, nor docs mention of trade nanies. commeretal 
products, or organiitaiions imply endorsement by the U.S. Government. 



TABLE OF CONTENTS 



Foreword vii 

Chapter I: Evaluation as Argument 1- 

The Coming Great California Earthquake 

Equivocality of Evidence: Certainty vs. Credibilitj* 

Evaluation as Persuasion * » , * ' 

The Evaluation Audiences ^ 

Premises of Agreement 

Quantitative Argument 

Qualitative Argument ' * 

.Ambiguity and the Development of an Argument 

Chapter II: The Logic of the Argument . ^ . . 22 

Modes of Reasoning ' 
\ Analysis of Glass's "Educational Product Evaluation" 
AnalysisofScriven's.Response to Glass's^ Evaluation ^ 
Naturalistic Evaluation 

Objectivity; Validly, and Impartiality Reconsidered 

Evaluative Discourse: The Good Life (along the San Andreas Fault)^^ 

References ' « .... 48 

Appendix 

Glass's Evaluation—Educational Product Evaluation: A Pirototype 
Format Applied . 53 

Scriven's Response to Glass^Educational Product 

Re-Evaluation 61 



FOREWORD 



The institution of schooling, existing as it does within a society under- 
going rapid changes, has multiple problems^and onlylimited resources of 
money, facilities, manpower, skills, and informatioirto meet them. Good 
information to guide both policy making and management^is an essential 
resource at all levels of educational decision making but is.especially 
important at the local district level; for it is here that choices about cur- 
riculum, instruction, and delivery, of other educational services most 
directly affect the daily^actions-of principals, ieachers, and students. 

There dre those who assert that evaluation activities, if they are based 
on a broad set of methodologies derived from^'the social sciences, can 
provide valid, reliable, and relevant information for a range of educational 
decisions— insfructibnal, curricular, manageriaJ, policy making. Up until 
now, admittedly, evaluators have not been able to provide this information 
in a relevant and timely manner., In the ten years or so since evaluation 
has been^formaily called upon to bear burdens both for accountability 
and policy making, the limitations of the technology available to eval- 
uators have become painfully apparent. If evaluation can be considered 
a discipline, it isi)ne which grows^by accretion; the agenda of unsolved 
problems, both theoretical and practical, attracts, researchers from diverse 
disciplines who apply diverse methodologies and call their work evaluation. 

The mission of the Center for the Study of Evaluation (CSE) is to study, 
from a variety ^of perspectives, the act of evaluation as it affects educa- 
Jjonal optograms and services* • 

The case for evaluation is based on the premise that people will, if they 
can, make changes based upon information. It is thus assumed that 
decisions about policies, programs, students, or services will be more 
rational if good information is available when needed. The simplicity of 
this idea has been challenged by many who view decision making as the 
result of influences far more diffuse than those^ within the conscious con- 
iiol of the decision maker. They hold thatpolitical, social, psychological, 
or organizational factors, while often unarticulated, dominate the dejision 
process. This monograph expfores the act of decision making from an 
analytical perspective. 

Dr. House was a resident Visiting Scholar at CSE in 1976. During that 
period, he worked alongside staff, provided counsel on a variety of prob- 
lems, and prepared the monograph presented here. Participants in the 
Visiting Scholar Program include recognized scholars in the conceptual 
and policy making areas of evaluation as well as methodologists primar- 
ily concerned' with the design, analysis, and interpretation of empirical 
studies. Members of the practitioner community are also invited to share 
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their perceptions of how CSE activities might assist school people in their 
evaluatidh tasks. 

This monograph approaches the analysis of evaluation from perspec- 
tives that have little reliance on quantitative origins. We welcome Dr. 
House's point of view and expect it to generate discussion within the field. 
It is our intent, through publications such asUhis, to stimulate the mem- 
bership of the field of evaluation to expand or to consolidate positions 
related to the purposes, methods, and uses of educational evaluation. We 
look forward to your comments. 

Eva L. Baker 
Director 
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EVALUATION AS ARGUMENT 



I choose the word "argument" thoughtfully, for scientific demonstrations, 
even mathematical proofs, arc fundamentally acts of persuasion. Scientific 
statements cah never be certain; they can only be more or less credible. 

Joseph Vfeizenbaum 
in Computer Power and Human Reason. 1976. 

Generalizations decay. 

Lee J. Cronbach. 

in Beyond the Two Disciplines of Scientific Psychology. 1974. 

THE COMING GREAT CALIFORNIA EARTHQUAKE 

I sit in Los Angeles but wonder why I stay. A sudden one-foot uplift 
has appeared along a hundred-mile strip of the San Andreas fault. Based 
on seismic wave readings, a California scientist has predicted a major 
earthquake for the Los Angeles area within a year {Science* May 1976). 
Based on, different readings, a radio evangelist warns of a major quake. 
Both scientists and seers agree in their prophecies. Neither provides the 
kind of information I need. 

I talk to the natives about these ominous signs. Their response is shaped 
by the necessity of living in such circumstances; they shrug their shoulders. 
The President has been informed, but no one seems to know exactly what 
to do. Washington official^ suggest setting up a new ^rray of scientific 
in^t^uments along the fault, although what will result from more measure- 
ment, is not clear. 

Meanwhile the weather is perfect,^ the setting in the Santa Monica 
Mountains splendid, the lifestyle sybaritic. Calculations of probabilities 
of long-term seismic events do me no good; I peed to know when the earth 
wijl move in relation to myself. 

The vocabulary of action is complex. Everyone agrees that information 
somehow informs decisions but the relationship is hot direct, not simple. 
Often t^he more important the decision, the more obscure the relationship 
seems to be. Consider the decision to marry. For most fieople, it is a long, 
arduous process, one which takec shape over a period of time. No single 
piece of information serves as a decision-point Quite the contrary. The 
decision proceeds slowly, almost imperceptibly, until it arrives. Reason 
after reason i's advanced and tried out. Finally, a multiplicity of arguments 
serves as a rationale for the decision, which is often made long before all 
the arguments .are advanced. 

I wish to thank Lee Cronbach. Bob Ennis. Gene Glass, and the CSE staff for detailed 
comments on the monograph. * * 
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2 THE LOC'C OF EVALUATIVE ARGUMENT 

The most significant decisions are those that have long-range implica- 
tions but. defy easy extrapolation, that qre so entangled with everything 
else that they resist precise formal analysis. To those we are forced to 
apply our intuitive logic, our common sense. It is in the nature of these 
complex problem'sHhat knowledge about them is limited^ t'hat it is less 
than determinate. In the face of uncertain knowledge, the task of en- 
tangled decision making becomes less .one of absolutely convincing our- , 
selves with proofs than one of persuading ourselves with multiple reasons. 
The criterion becomes not what is necessary but what is plausible. 

EQUIVOCALITY OF EVIDENCE: 
CERTAINTY VS. CREDIBILITY^ 

Why, then,, do government qfficials, thc; public, and even members of 
the evaluation community call Tor definitive proof of the success of edu- 
cational programs? There is a tradition as old as Descartes which says 
thai the only kno\\ledge is that which is certair. Descartes's methofl of 
analysis was one of total skepticism: to doubt everything that could be 
doubted. In his search [or certain knowledge, he arrived at the self-evident 
as the ultimate mark of reason. For something to qualify as knowledge 
it had to start from (clear and distinct ideas and be extended by deductive 
proofs. Propositions so derweH were thus necessary and compelling to 
the intellect; they could not be rationally denied. 

This method excluded the merely credible from consideration as knowl- 
edge. In thQ Cartesian ideal, the only true reasoning is analytic. Formal 
deductive logic, the method of proof used in mathematics, is the method 
par excellence. Knowledge can be reduced to self-evident propositions. In 
certain knowledge there can be no disagreement. As Descartes wrote, if 
there- is disagreement over a matter bet*%cen two men, one of them must 
surely be wrong. There ii a true and a falsc,,and logic works by compelling 
proofs to determine which is which. • 

Laier, those who pursued this line of reasoning confronted the fact that 
rational men iiften seemed to reason differently and arrive at contradictory 
, conclusions. Some of Descurtes's own propositions looked particularly 
\ suspicious. Pasi:al introduced the explanation that such disagrecmenfas 
well as the reluctance to accept necessary conclusions was a result of 
irrationalit)^. Man Avas seen to possess an irrational side which often ied 
him astray in his search for knowledge. The apparent irrationalih of those 
who do not accept conclusions which others perceive as conlpcHing is a 
common motif in contemporary evaluation. \ 

From the Cartesian perspective, certain knowledge can be ^obtained ^ . 
primarily by deductive processes and it must lead to absolute conviction, 

'Forithis (iistmctiun and man) A»thcr idcub in thib p.ipcr. I .im iiidcblcti to Pciylman and 

OlbiyJitsi.bmAsjcxcdlcntniudcr tM»rK tin argumcnt.ition Tlw Nvw RhvUtrk A Treatise 

ON Arumtvtit* University of Notre Dapic Press. 1969. 566 pages. 
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Such reasoning may work in geometry, but it does so by excluding most 
of the sensate world. As Hume pointed out. our beliefs, evch in concepts 
as basic as causality, are not certain when a thorough skepticism is 
applied to them. Deductive reasoning succeeds in /{)roducing certain 
knowledge primarily'by eliminating most of the everyday. world. 

The sensate \yorld was epistemologically salvaged for our use by John 
Stuart Mill. Just as logicians had constructed formal deductive logic by 
reflecting on the nature of mathematical proofs. Mill reflected on the 
associationjist psychology of his time and formulated an inductive logic 
that purported to introduce certainty into inductively-derived knowledge. 
To do this Mill made several assumptions that still pervade survey re- 
search today. According to Hamilton (1976). the axioms include the 
following: * ^ 

• There is a uniformity of nature ia tinie^-ind space. This lends to 
inductive reasoning the same procedural certainl> as to conclusions 
drawp from syllogistic logic, 

• Concepts can be defined by direct reference to cmpiricaJ^ategories 
and laws of nature can be inductiyely derived from djna because of 
the above. . ' ; 

, • Large samples can suppress idiosyncraci^s an9 reyeal "general 
causes.** . * \ 

• The social^ aiid natural silences have the same aim of discovering 
general laws (which provide a basis for epcplanation and predictions). 

• The social and natural sciences are jnet lodologically identical. 
•^The social sciences^are merely more complex. 

Thus. Mill contended that certain knowledge was derivable from induc- 
ti\'C reasoning as 'Aell as from the deducti\e. One could define categories 
and relate them to each other by now familiar techniques. In fact, Mill 
concluded that the inducti\e method was theow/>' wa> of discovering new 
ideas since deductive logic, could onl> reveal what was already there. (Mill 
was so certain of hrs method that he contended that ethical principles 
could also be derived t)y inductive reasoning and hence had a scientific 
base.)^ . : ' - ^' 

M ha^c diScdsscil the pimcrful effect utilitarian ethics ha^ had on the practice uf evaluation 
in a paper entitled. "Justice l^valuation" in /^valuation Studies Annual Review; Gene 
Claw, editor. Beverl> Hills. CA. Sage Puhlicatrons. 1976. At the end of his masterpiccc^on 
indiictive logic. Mill considers the logic of a "practiu" or "art." "There niust be sonic 
Mandard b) which to deiermtne the gtM>dness or badnei.^. tibsolutc and comparative, of 
ends, or objects pf desire. And whatever that standard is, there can be but one. for if there 
iverc several u!tj;rfate principles ufcunducl. the same conuM.« might be approved by one of 
those pr4nciples and condemned bv another, and there vvould be needed 'some general 
principle, as umpire between them." John Stuart. Mill m A System uf Loru., Harper. New 
Vork.J893(8th Edition). 

This leads Miii to t.npose a single universal standard b> which to judge practical affairs, 
for the onl,v alternative is b> 'supposing a moral .sense or instinct" or "intuitiie moral 
principles." tieneral ethical principles can onl^ be known h> induction. Since inductive 
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Mill's first assumption is the important one. In Mill's owjn words, "The 
universe, so far as known to us, is so constituted, that whatever is true in 
one case, .(ue in all cases of a certain description; the only difficulty is, 
to find what description" (Mil], 1848). How familiar that idea is to an>one 
who lias engaged in survey research and how fallible the inductive logic 
on which .it is basedl^^X^^ ' 

The* procedure oT reasoning from "some" to "all" is clearly a logical ^ 
fallacy. Each confirming instance is supposed to make a h>pothesis more 
likely. Yet if the hypothesis is "All men are less than 100 feel tall" and 
one finds a man 99 feet tall, this is a confirining instance that weakens 
the hypothesis considerably rather than strengthens it {Scientific Amer- 
ican, March, 1976), Does every day that goes by in Los Angeles without 
the predicted great quake make it more or less likely? It is also quite 
possible in statistical studies to confirm a hypothesis b> two independent 
studies and yet disconfirm the hypothesis by using the total resists of the 
two studies taken together. (See Simpson's paradox in Martin Gardner, 
"Mathematical Games," Scientific American, 1976.) 

Nonetheless, in spite of, serious na\vs of logic, "science" based on 
inductive logic seems to wotk with sojiie degree of success. Certainty of 
knowing, however, is lacking. Even the be^t established scientific facts 
musf be" held as tentative. As one scientist put it: 

The man in ihc si reel surely believes such scieniific facis lo be as well* 
esiabhshed. as weil^proven. as his own existence. His :eriiitide is^n illusion. 
Nor is ihe scteniisi himself immune lo ihe same illusion. In ])\s praxis, he 
musi. afier all. suspend disbelief in order lu ^lo or ihink an)ihing ai a J. He 
:s raihcr like a iheaiergoer. who. in order lo ^ariiclpaic in and understand 
whai IS happening on ihc si age. musi for a lime pretend to himselMhat he 
IS witnessing real evc/ils. Theiscieniisi must believe his working h.f^tSihcsis. 
together with its vast underlying structure of theories and ^assumptions, 
even if only for the sake of the argument. Often the "argument" extends 
over his entire lifetime. Gradually he becomes \^'hat he at first merely pre 
tended |o be., a true believer. I choose the word "argument" thoughtfully, 
for scientific demunstrntiuns. even mathematical proofs, are fundamentally 
acts of persuasion, . _ ' \ 

Scientific. statement's can never be certain, they Qn be only more or less 
credible. And credibility is a term in individual psychology, i.e.. a term thai 
has meaning only with respect to an individual observer. To say that some 
proposition is credible is. after all. to say that it is believed bj an agent who 
IS free not to believe it, that ts. by an obscrvefwho. after exercising judgment 
and (possibly) intuition, chooses to accept the proposition as worthy of his 
believing it (Weiijenbaum. 1976). 

certamtv pre-suppusts a uniformity of nature, the resultant psychology is deterministic. 
Morality is natural sintc only a natur^^iistic assessment wdl allow scientific methods of protjf^ 
Hedonistic utilitarianism is the only basis. 

In a sense. Mill )^as preventing disagreement over moral issues since it is alwayspflfssibic 
\u receh opposne tundusions wfjen there is no previous agreement on a criterio|i 'The result 
uf this reasoning} is utilitarian calculation which conflates all human dcsiris into a single 
configuration and satisfies them by the criterion of maximum total^iiusfactions derived 
fhc judgiug IS done by an ^impartial spectator." who in modej:n limes demonstrates his 
impariiality by employing ''objective** techniques of analysis^ 
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EVALUATION AS PERSUASION . . ,^ 

If even demonstrations in the ph^ '^cal sciences are fundamental!) acts 
of persuasion, in<juiries in education are more so. MillVassumption tha\ 
the social and natural sciences are methodoJogicall> identical seems much; 
more dubious today. Cronbach (1974). for one. doubts the advisability 
of imposing physical science ideals in social sciencV. Th~lhC ^physical 
science paradigm, events are explained and predicted by "a net^^ork of 
propositions connecting abstract constructs.*' 

After reviewing twenty yeafs of aptitude treatment interaction studies, 
which were based on sucU^a model. Cronbach concluded 4hat social 
phenomena are Um open to interattions with other variables to support 
stable genejali/^ations. The positivistic 6trateg> of fixing conditions in 
which to reach geflerali/,atiJtis assumes steady processes that can be 
separated, into independent systems for study, a fragile assumption in 
social syslems, y/^ ^ ^ ' , ^ 

Cronbach ha^^stiggested interpreting data in context rather than tr>ing 
to arrjve algtlneralizations. An observer in a particula|^ setting can dcf 
saibe and^erpret effects within lucaf conditions. Whereas experimental 
cofttp^rfand systematic correlation ask formal questions in ad\ancc. .local 
ob^nation is more open to the unantidpatcd. Short term empiricism is 
sensitive to the context. In being tSjntext sqnsHivc, the research^er ina> 
gi\c up some predictive power. He gives up constructing gencializations 
and theor) building and instead develups "conLcpts that will help people 
use their heads/' So Cronbach contends. , 

Evaluations themselves. I wquld contend, can be no more than acts of 
persuasion. Although sometimes evaluatvTo promb^e Cartesian proof and 
ust J. S. Mill's, methods of Induction. e%aIuaVlonsincvita]bl> lack the cer 
tainty of proof and conclusiveness hat the public often expects. The 
definitive evaluation is i are,, if \t eAists at alf. Even a scie/itific method 
i)logist*as sophiMicated as James Colen.an is faced with continued and 
••enchant criticism of his work. Subjected to serious scrutln}^ evaluations 
always appear equivocal. . • - 

E, meeting evaluation to provide oumpdilhg and necessary conclusions, 
hopes for more than evaluation can deliver. Especiatl) in a pluralistic 
suLiety. evaluation tannui produce nt*ctssarv propositions. But cannot 
produce the necessary, it cai. provicic^he credible, the plausible, and the 
probable. Its results are less vhah ccrtau Jt still may be useful,. 

Proving something implies s«o-fy'»''^o *'t.yo.nd doubt the unde/stanijing 
of a universal audience Aviili regard ty the .ru<:!i. Tw produce proof that 
a universal audience compriscd.yf all rational men wl uld accept requires 
ovcrcommg local or historical particularities. Cert<.n!y requires, isolating 
data from its total context as, for example, in the tcims of a syllogism. 
Logical certainty is achievable onl> within a Josed, totally .^eP.ned system 
like a game. 

If evaluation u limited to certain knowledge provided by strict Reductive 
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and inductive rcasuning, it must abandon a great amuunt uf rcasuning 
power that people ordinarily use in the conduct of their lives. Such a 
limitation results from confusing rationalit) v\ith logic. The> are not 
identical. 

ff a|)solutel> convincing all rational. rn^en is \po heav> a burden for 
evaluation, pcrsu idlng particular men is nut. In place of the compelling 
prop^ositions derived from rigorous logic, one nta) substitute the non 
coippelling ar^uinents of persuasion. In place of the necessity of self 
evidence, one ma> substitute variable adherence to theses as presented 
tQ particular audiences. The thesis^, ma> be more or less credible. The 
audience is ftee to believe or not believe after inspecting the arguments 
and exercising its own judgment. 

F^crsuasiMn aims at winning a particular audience to a point of view or 
course of aLtiun b> an appeal to thq audience's reason and understanding. 
Ft^r this purpose, uncertain knowledge is useful although the ideas them 
selvj*^ are a!wa)s arguable. The appropriate methods are those of argu 
ment*^tioii. which is the realm of the "credible, the plausible, and the 
probable" rather than theneces^irjlPcreimariiSLOIbrechts T)teca, 1969). 

Argumentation is conttasted to demonstration. Demonstrations rest on 
formal logic which avoids ambiguit> b> the internal consistenc) of its 
svmbol svstem. In deductive JogiL the origin of the axioms is extraneous. 
When one moves from deduction to induction, all'manner of issues be 
come arguable, such as the validity of measurement. But the search is 
still for '^certain" knowledge. 

In evaluiition, the social and ps>chologicakontexts become particularly 
relevant^and the knowledge less certain. Under those conditions argumen 
tation^imed at gaining the adherence and at increasing tbe understand 
ing oil particular audiences, is more appropriate. Persuasion claims 
validit) for onl> partiJul^f audiences and the intensity vvith which par 
ticular audiences accept the evalCiative findings is a measure of this 
effectiveness. T!)e evaluator does not aim at convincing a universal au 
dience of all rational men with the necessity of his conch^sions. 

Persua.sion U^directlj related to action. Even though evaluation infor 
mation is less certain than scijnttflc information addressed to a universal 
audience, ^persuasion is effective in promoting action because it focuses 
un a particular audience and musters information with which this audience 
is concerned. Personali/.ed knowledge that induces people to stop smoking 
ma> be different from sci ntific generali^atiuRs irhking smoking t\f heart 
disease or cancer. Finding out about the heart attack of a close relative 
is more tikel> to induce one to exercise than are charts and tables. Evatua 
live argument is at once less certain, more particularized, more personal 
ized, and more conducive to action than is research information. 

In su0imar>, evaluation persuades rather than convinces^ argues rather 
than demonstrates, is credible rather than certain, is variably accepted 
rather than compelling. This does not mean that it is mere oratory or 
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entircl} arbltrar>. Because it is not limited to deductive and inductive 
logics does not mean tliat it is irrational. Rationalit} is not equivalent to 
logic. Evaluation eniplo>s other modes of reasoning. Once the burden of 
certaint) is lifted, the possibilities for informed action are increased rather 
than decreased. 



CHART I: 

Contrasts Between Evaluation as Argumentation 
and Evaluation as Demonstration 



Evaluation as 
Argumentation 


^valuation as 
/ Demonstration * 


Persuasion 


Absolute conviction 


Credibility ^ 


Certainty 


Non-compcllin^^ 


Necessary 


Variable adherence 


True or false 


Particular atidicnce 


Universal audience 


Dialectical reasoning 


Analytic reasoning 


Informal logic 


Formal logic 


Reflective 


Calculative 


Action-oriented 


Theory-building 


Tacit knowledge 


Explicit knowledge 


Knowledge in heads 


Knowledge in propositions 


Ambiguous 


Clear and distinct 


Concrete 


Abstract 


Arguable 


j^efinitive 


Direct experience 


Indirect indicators 



THE EVALUAf ION AUDIENCES _ 

If persuasion becomes the aim of evaluation, fiie audiences to whom the 
evaluation is addressed arc important. For >ears evaluators have been 
counseled to think of their audiences ^nd the kind of information the 
audiences will need. Wha^is relevant for one group ma> not be relevant 
for another. Argumentation presupposes th^t a "community of minds" 
exists, that there is intellectual contact,, and that there is^ agreement on 
at least a few issues on which deliberation is to begin. 

There must be a comrhonJanguagc and a desire on /tlie part of the 
evalu^tor to.pepuade the audiences and to take their concerns .seriously. 
Often these Conditions are nut met. The audiences are misconceived or not 
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taken seriously* It is not uncomnion for the evaluator to muster informa 
tion appropriate to an audience of /psychologists but which has little 
meaning to a teacher or a government official. 

There are at least three general types of audience: the universal au- 
dience, a single audience with whom one engages in dialogue, and 
oneself as an audience. Argumentation with a universal audience strives 
to gain the adherence of e\er) rational person. Conceptually the universal 
audience consists of all men at all times so the arguments must be timeless 
and free-of context. \ 

The agreement of a universal audience is likely to.be secured by formal 
logical reasoning based on self-evident concepts. Thus the tighter the 
experimental design, the .more convinced a far-removed universal au- 
dience will be of the cause and effect relationship, regardless of the 
context- A particular audience closer to the scene may assume cause and 
effect without such proof. Of course, the universal audience is not "aggre- 
gatable" at any given time but various elite groups in fact serve as a 
surrogate for it. Perhaps philosophers more than most represent this type 
of audience. The arguments that move philosophers are not alwa>*s the 
same as those that move teachers. 

The more ah argument Is directed toward a universal audience, the less 
"arguable" it Is. There is little to argue about in pure deductive logic. 
Evaluation techniques are often presented as being non argumentative, 
as, for example, being based' on valid and reliable instruments, as em- 
ploying sound statistical procedures, and"so-on. In-fact; all statements 
made on the basis of an evaluation are subject io challenge and are 
arguable— if properly challenged. The more technical ancl quantitative 
the evaluation, the less a naive audience will be ^ble to challenge it and 
the evaluation will appear to be more certain than it 5s- 

In evaluations using statistical metaphors, one can argue that treatment 
effects differ because there is a probability that two mean test scores 
belong to different populations and, hence, that the experimental pro- 
gram is better than the control. The extensive use of numbers in the 
statistical procedures and the test scores gives a semblance of certainty 
and uncquivocality to evidence. 

Actually many assumptions he concealed behind the numbers (as in 
deed behind e>ery evaluation). One can almost always challenge the 
validity of the tests, the appropriateness of the statistical procedures, 
and the control of the experimental design. The "hallenge does not in 
validate the evaluation. But once the premises are challenged, the nature 
uf .the evaluation as argumentation becomes apparent; The evaluator may 
defend his study cither successfully or unsuccessfully /In any case, he must 
resort to nun-deductive and more equivocal reasoning if he is to defend it. 
AJthough the evaluation has the appearance of appealing to the definitiv;c 
rationality uf the universal audience. It ends in direct appeals to particular 
. audiences. 1 believe it is impossible to construct an evaluation otherwise. 
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Even a broad-based evaluation operation like Cqusumers Report, which 
uses * objective** procedures and sophisticated experimental designs to 
evaluate consumer products, is an appeal to particular audiences. Its 
arguments, directed at the. uppec-middle class, have little meaning for 
either the lower classes or the upper classes, -and its evaluations are little 
heeded by them. 

Thus the situation the evaluator faces is almost always an ajjpeal to 
particular audiences which he can define ^\itb some precision. If he cannot, 
define his audiences, the evaluation is indeterminate. He must address 
issues and construct arguments that appeal to particular audiences. Fuc- 
thermore, the audiences are likely to be a composite of several groups 
which complicates his task considerably. Effective appeal to particular 
audiences changes the limits of applicable rationalitv. One is not confined 
to the most restrictive modes of reasoning. If evaluation becomes more 
equivocal, it also becomes more possible. 

Ojxc ideal, of f o party argumentation is embodied in the Socratic 
dialogue. The dialogue develops as a rigorous chain of reasoning between 
a questioner and a responder. The one -person audience is persuaded b> 
getting him to agree on certain principles point by point. The audiences 
particular concerns are ultimately addressed in the interaction. The 
Socratic dialogue is also powerful to third parties who might iiead it (see 
Scriven's goal-free dialogue. Scriven. 1973). 

The actual ^udience most evaluators face seldom consists of one person, 
however, ft is most often a composite. Some evaluation theorists have 
suggested modes of evaluation in which the evaluator engages in frequent 
exchange with the audience throughout the study (see Stakes 1973 **re- 
sponsive evaluation" in which the evaluator is expected to respond to the 
concerns of the program personnel). Whatever the mode of evaluatioij, 
I would contend that evaluation which succeeds in being persuasive must 
engage the afidience in fundamental discourse, although ihat discourse 
.may occur in different ways. 

Discourse conducted in this 'fashion is more than a mevi debate in 
which different points of view are presented b> partisans. The dialogue 
must be a discussion in which the parties seriously arid honestl> search 
for mutual answers. This restriction severely qualifies the^use of adversary 
methods as persuasive devices since one ma> adjudicate a conflict without 
persuading anyone of anything. ^ - 

Legal procedures are important new means of encouraging evaluative 
discourse (Wolf, 1974). Yet to be a successful discourse in which people 
listen to one another, as opposed to a forensic contest, -the acrimony in 
court trials' must be reduced? One must avoid the bias sometimes evident 

'Ramsey Clark, the former U. S. Attorney General and a trial lawyer, is opposed to the 
adversary process as a truth discovering mechanism. "If ihcrc is a worse procedure *for 
di$co\cnng the truth. I don't know what it is.'" He claims that no one knows any more after 
a criminal trial than before. The trial is simply a dramatization fur the benefit of the jury. 
Wolf and Farr (I9''6>. as adversaries in their evaluation of (he Indiana tni>ersity atternative 
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in courts of law. Admittedly, the distinction between a discourse for 
discovering truth and mere oratory is not easy to. make. ^ 

There is at least one other audience one can address'in argi^ijnentation 
—oneself. Some have reasoned that arguments addressed to oneself are 
more likely to be valid and sincere since there is little advantage to fooling 
oneself. If the ''self is conceived as the program staff, this means forma- 
live evaluation. I have seen few really successful formative evaluations. 
EUher the information the evaluator collects is irrelevant to the program 
* staff or the evaluator is perceived as being too much of an outsider to be 
\ a credible source.. 

Kem'mis 11976) recently advocated "evaluation as self-criticism." He 
*. - sees the primary audience as being the program staff itself. Believing a 
dialectic between knowledge and action to be the only way to improve 
practice, he suggested that evaluation standards be derived from the 
program participants themselves andJhat the data consist of the progress 
as seen by participants. Evaluation thus becomes therapeutic self-criti- 
cism. The ultimate goal is increased understanding and' insight of the 
participants themselves, which can then lead to effective action. 

A follower of J. S. Mill would not think highly of this approach since 
self-knowledge in his viewpoint uould be likely to lead to rationalization 
rather than to reason. Mill thought in terms of the self as afudience only 
insofar as it represented the universal audience. Propositions would be 
established as either true or false. A more argumentative approach aims 
at increasing the adherence of the audience rather than demonstratihg 
truth or falsity. , - 

In fact, the difference in viewpoints is more fundamental. It is partially 
a difference as to vvhere knowledge exists. Does it exist in propositions 
whose truth "can "be certified, or does it exist only in individual heads? The 
view taken here is that knowledge exists only within the mind. The goal 
of evaluation is not to arrive at a form.al statement except as it stimulates 
understanding in the mind of the audience. 

In the argumentative approach^the' audience must also share responsi 
bilUx^^ioce the xniomM^^x^-'^oi compelling, the audience is free to 
choose itso^-tkgreeof commitment. It must actively choose how much 
It Wishes to believe. This requires an active testing of the evaluation by the 
audience itself rather than a passive acceptance or rejection. The audience 
must make a personar commitment and share responsibility. This rational 
decision belongs to the audience, not to the evaluator. 

PREMISES OF AGREEMENT 

The development of an evaluation argument presupposes agreement on 
the part of the audiences. The premises of the argument are the beginning 
of this agreement and the point from which larger agreement is built. Just 

teacher education program, w-crc well aware of this difficulty and tried to reduce the 
Q competition tlccordingty. 
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as common sense admits unquestioned truths that are beyond discussion, 
some of the major premises of an. evaluation are tacit riather than explicit. 

According to Perelman and OlbrechtsTyteca (1969); there ^are two 
classes of premises: the "real" and- the "preferable." The real includes 
Jacts, truths, and presumptions and generally claims validity vis-a-vis the 
universal audience. On the other hand, the preferable is identified with 
a particular audience and includes valuesi composite value. hierarchies, 
and value premises of a very general nature called **J6ci." 

Facts and truths are those data and notions which are seen as agreed 
upon by the universal audience, i.e., held in common by thinking beings, 
and hence needing no justification. Whether a datum is a fact depends 
upon one's conception of the universal audience. If the audience changes, 
so can Tacts and truths. However to hold the status of a fact or a truth 
means th\ for the purposes of argument the datum is noncpntroversia! 
and uncontested. If the datum is questioned, it loses its status as a fact 
and becomes itself an object of argument rather than an object of 
agreement. 

Where\here is agreement on the conditions for verification as in mod- 
ern science, there can be many facts. Many; data are not accorded the 
status of "facts"' by modern science. Polanyi (1958) pointed oiit how 
science -protects its own system of beliefs from inconsistency by denying 
various data as factual which conflict with other beliefs. Thus for many 
years science did not recognize hypnotic effects as occurring at ali. These 
data were not recognized as factual because they conflicted with the cur- 
rent general scientific belief system. This belief system may change from 

'time to time, but regardless of what it excludes, arguments within the 
belief system must be based on uncontested facts and truths. ^ 
Arguments also proceed from presumptions which do not have the full 

' authority and confidence of a fact or truth. Presumptions cannot be 
proved but are nonetheless \videly accepted as being tentatively true. 
Many presumptions arc connected to the concept of the normal. In eval- 
uations employing statistical models and metaphors, theassumption that 
attributes within a population are normally distributed is almost univer- 
sally accepted. Perhaps an implicit presumption of all evaluations is that 
the act of evaluation itself will somehow iipprove the program under 

' inspection. 

The second class of objects of agreement is that of the preferable. 
Objects .^reference claim the adherence of only particular groups rather 
than that of the universal audience. Values are the most conspicuous 
examples, Agreement. with regard to a value is an admission that there is 
a specific influence on action or a disposition toward action that the 
evaluator can make use of. Although relevant for a particular group, a 
value is hot regarded as binding on everyone. » 
In science, values enter primarily in the selection of objects of ^interest 
O for investigation since one cannot inve^ig^^te the entire world (Polanyi, . 
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1958) possibl) in the acceptance of scientific conclusions b> overall 
human judgment (Wtizenbaam, 1976). But during most of the argument, 
especially in the exact sciences, values are supposed to be excluded. 
Ennis*s (1973) analysis of cause and; effect relationships leads one to 
question this. In evaluation there is no question that values enter at every 
stage. Values are used to persuade the audiences and to justify choices 
to others. \ ^ 

Abstract values like truth, beauty, and justice have a universal appeal 
only because they are so general and unspecified. Once their content is 
determined, the> appeal to certaih audiences and not to others. Their 
role is to justify choices where thep is not unanimous agreement. For 
example, in m> analysis of justice in evaluation (House, 1976), I contrasted 
three specific conceptions of justice., the utilitarian, the pluralist intuitive, 
and justice-as-fairnesA. The purpose of the analysfs was to justify protect- 
ing people being evaluated and to promote? more egalitarian criteria in 
actual evaluations. The analysis was warmlj endorsed b> those who agreed 
with such values and was not well accepted by those who did not. although 
everyone is in favor of "justice." 

Abstract values like justice can be contrasted with concrete values like 
America or individual persons. Abstract values are more readily used for 
criticisms as they are not respectors of individual persons. Concrete values 
like fidelity and solidarity lend themselves more to compromise and con 
servativc argument. Of course value> are not held"^ exclusively by any 
group. Audiences are perhaps better characterised by the relative weights 
given to various values. / 

Various combinations of arguments can be compressed inio a few 
general groupings called "loci" (P-^relman & Olbrechts-Tyteca, 1969). 
*^The most common loci are those, of quantity and quality. Arguments 
grouped around the ioci^oi quantj/> affirm that one thing is better than 
another for quantitative r(>asons.^^greater number, higher degree, more 
durability, etc. The effectiveness pf means will o(ten be justified by quan 
titative loci. The idea of the normal and the norm are also based on 
quantity. 

Contrasted with quantity is the idea of quality. Something has high 
value even though it defies number. Associated ^vith quality is a high 
rating of the unique. One can be in possei^sion of truth while the multitude 
. IS in error. For example, Scriven (1972) contended that the'notidn of 
objectivity is not necessarily linked to the number of people holdmg an 
idea, nor subjectivity Co one person's perception, as is Often believed. 

Besides general agreements on facts and values, there arc special agree 
ments particular to certain special audiences and particular to each bval 
uation. To the extent that the evafuation is addressed to a technical 
audience, that audience will share certain agreements and conventic^is. 
A group.^of educational researchers is such a technical audience, Eval 
9^»uations directed toward a lay audience cannot rely on the same agreements 
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Perhaps the most important agreements peculiar to a particular evalua- 
tion are those derived from the negotiation that often precedes the 
evaluation— agreements between sponsors, program personnel, and eval- 
uators. in this exceedingly important negotiation, agreement can be 
reached on criteria, methods and procedures, access, dissemination of 
results, and so on. Disagreement on these points can destroy the entire 
•credibility of the evaluation. ' * . . ^ 

!n summary, at the beginning of an evaluation, the evaluator must 
build upon a|;reements with the audiences. These agreements may be 
implicit as well as explicit. In fact, it would be impossible to specify all 
these understandings, although it is quite dangerous to assume.agreement 
on important points where there is none. The evaiuator must start from 
where his audiences are, even though the beginning premises may not be 
acceptable to o'ther parties nor to, the evaluator himself. Otherwise the 
evaluation will not be credible and persuasive. There must be at least 
some coMm,on understanding. If the basic \alues are too discrepant, the 
evaluator has the option of not doing the study. Of course, those basic 
understandings are subject to prevailing conceptions of decency and 
justice in the society as a whole, and the eyaluator has the option of 
drawing upon tfie.>e larger social understandings. 

That is not to say that the evaluator jhould be in total agreement with 
his audiences. Presumabl> there are areas of disagreement or there would 
be no need for argument. Presumabl> the audiences wish to learn some- 
thing new or there would be no need for evaluation'. But the evaluation 
proceeds from areas of agreement^ to those areas where agreement is 
problematic. 

quantitative; argument ' 

The most popular approach to evaluafTon is the quantitative. Some see 
it as the ver> essence of rationality ,and scientific method."^ Many good 
evaluation studies have resulted from it— and 'man> bad ones. Since this 
approach is taught in thcgraduate schools and promoted in the literature, 
there is little need to further extoll its virtues-^they are many. In this 
section I would, like to show that even quantitative mcthodolog> is essen- 
tiall> argumentation and is subject to similar considerations. Properl> 
used, it can be a valuable tool of anal>sis, improperly used, it is dangerous. 

Quantitative metljLodolqgy is a body of mathematical methods and 
^measurement techniques available to the evaluator. The utility of the 
niethodoloigy depends on similarities between the theoretical problems 
dealt with by the methodology and the substantive problems dealt with by 
tne eyaluator in the local setting. For his part, Cronbach (1974) has 
already determined th^at the fit of the theoretical and substantive problems 
is not a good one. The educational context is too complex. 

iVi a probing analysis, a Rand Corporation mathematician (Strauch, 
197o) examined the difficulties of quantitative methodology as it applies 
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to polie)' studies, i.e., questions arising from the government decision- 
making process. According to Strauch, in so far as the methodology is 
mathematical, it is a self-contained system the structure of wjiieh is 
determined b> the premises defining the system. Mathematical analysis 
the exploration of that structure as it follows logically from the premises. 
The results are connected to the premises by logieai inference. IiTthe sense 
that .their validity can be determined on the basis of that chain of Reason- 
ing, the results are "objective"— there is no need to appeal toJhe compe- 
tence or judgment of the person who produced them, nor to the ru.dience 
to whom they are directed. The results are necessarily logical. In argu- 
mentation, by contrast, the results cannot be totajly separated from the 
person' who arrives at them. 

The application of quantitative methodology tp a substantive problem 
uses a mathematics model as a simplified representation of the probleni. 
The results depend ii) part on the m^ithematical analysis— but equally on 
the fit between tha model and the substantive problem. In the simplest 
applications, su^h as in physical science, the substantive problems are 
rigorously quantifiable.. Experimental control enhances somewhat the 
ability of the e\aluator to make the substantive problem conform t the 
mathematical modehj.e., randomness in statistical models. In su<?h cases, 
the conclusions are ''objective" in the sense that they are subject to inde- 
pendent verification on the basis of fhe Ipgic and'fit, without ref'^rence to 
the judgment of the person who produced ;hem. However, the more 
behavioral or political the substantive problem, the more difficult it is to 
define it unambiguously in .'mathematical terms. The link, between the 
^substance and the motiel become4enuous. 

Strauch identifies tlje following components of such a quantitative 
study. FormttUttwn mvolves defining the formal problem from the sub- 
stantive problem, then finding a mathematical m.odel for the formal 
problem. This is a process of r(^^wQ\\oT\. Analysis involves Computation 
within the mathematical context defined by the model. It results .n mathe 
^matical statements. Interpretattun means converting the statements back 
into the formal problcnvand finally/mterpreting these conclusions within 
the .substantive context. • ' 

The validity of concfusions depends on both the logical validity of the 
analysis and the validity of the linkages. While the logical validity can be 
determined without reference to the subjective judgment of the analyst, 
the linkages cannot. They are founded upon the subjective judgments of 
the analyst. Both formulation and interpretation are subjective ptpcesses. 
Formulation requires reducing the substantive prqblem to something 
smaller that can be handled by the analysis and possibly adding .ome 
assumptions which make the analysis easier but may be questionable on 
substantive giounds, e.g., the independence of events. 

Interpretation mvolves restoring the contextual considerations that have 
O been eliminated and possibly adjusting fjr the simplifying. assumptions^ 

ERIC , o- 
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Both formulation and interpretation require considerable doses of in- 
tuitive judgment. Hence the conclusions rre not really "objective" as 
claimed. (See the discussion of objectivity in a later section.) 

The usual way-of-dealing with the subjective-part of the methodology 
is to ignore it. For one thing it is not^such a great.pioblem in the natural 
sciences where quantitative methods: have been so successful. Evidence of 
. "objectivity*' there is taken as proof of objectivity in other areas. When 
these links are challenged it becomes clear enough that quite arguable 
premises underlie them. 

Good insights are often derived from quantitative studies, but they 
usually result from tIjeanalyM making the right intuitive judgments ratUct 
than th6 right calculations, Those successes are often attributed to the 
quantitative methodology itself rather than fo judgment. Critiques usually 
focus on the technical quality of the mathematical analysis rntl^er than on 
the quality of judgments associated with formulation and interpretation. 
When quality of judgment is challenged, justification must rely on the 
kind^of reasoning common all argumentation- 
One result of underplaying the role of judgment is what might be called 
"method-oriented analysis," according to Strauch. The i^nalyst ignores 
tlie complexities of t{ie context and^plutiges ahead with his favorite 
method. With superficial thought the methodology is applied in a straight- 
fonvard manne^ as if 'there were no problems of fit. A few caveats are 
thrown in at the end suggesting that it is the readers! problem to decide 
whether the fit is a good one. 

In its extreme form there is a school of Ihought which Strauch calls» 
"quantificationism" which holds that quantification li, a positive value in 
itself. A quantitative answer is always better than a qualitative one. Any 
problem can be reduced to. a quantitative solution and no problem can 
be properly understood until it is. Therefore quantitative methods should 
^be applied to all problems. This position may be a straw man in that few 
people would really subscribe to it. 

Such an attitude, which favors scientific methodology,, is based on a 
reductionism that treats a phenomenon as an isolated system', de.velops 
a quantitative model for, that system, and uses that model as a surrogate^ 
for the phenomenon. As suggested previously, reductionism may be one 
element of physical science not transferable to social phenl)mena. 

The image the quantificationist projects is of a puiTeyor of objective 
"fact" based on ha^d data. He takes no .personal responsibility for con 
elusions reached by his methodology since they arc not of his making. He 
has simply uncovered them. He is merely reporting the.results of his ob 
jcctivQ methQ^.**. He disdains qualitative data as subjective. 

This attitude is close to what Polanyi (1958) described as "objectivism' 
in science. This is an attempt to define an objective method such that it 
relieves the observer of any responsibility fjrliis findings. Polanyi con 
tended, on the contrary, that the hglding of a belief requires persona 
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commitment and responsibility even in science. Objectivism Has soughlto , 
represent sdentifie knowledge as impersonal. ^ 

Often quantificationism and objectivism, also suit the decision maker 
in that he may justify hk decision by reference to a **scientffic'' finding. 
It may help him ayoid pei*Sonal responsibility. Attempts to quantify prob- 
lems that are not quantifiable and to ignore the judgmental factors even- 
tually distorts decision making. 

Strauch suggests that one way to eliminate such distortion is to use 
quantitative methods as a perspective rather than a surrogate for the sub-. 
stantiVe problem. Accepting the mathematical model as a valid repre- 
sentation of the substantive problem means. using it as a surrogMe. Using 
the model by incorporating findings into knowledge one already has 
mdaJis using it as a perspective. 

For most substantive problems the audiences of the evaluation afreadv 
have well-developed images of their own. The quantitative analysis may 
give the audiences an additional but not necessarily better or more valid 
insight injto the problem. The interaction between one's own images and 
additional insights must take place in the heads of the audiences, the 
decision makers or whomever. .Using quantitative methodology as only 
one perspective reduces t^e problem of the fit between the.model and the 
problem.* - - , - 

On the other hand, both the evaluatoV and the audiences must take 
more personal responsibility for the findings since they do not necessarily 
follow frorii the analysis. The conclusions cannot be justified entirely on ^ 
the basis that the> follow logically from -the Assumptions. Evaluation of 
individual assmptions must be supplemented by holistic evaluation of 
thejoial. 

Quantitative argument, then, ^oufd always be used in conjunction 
with human judgment, and human judgment should be gfven the superior 
position. The implications for quantitative argument in evaluation arc 
strong. Quantitative methodology should be seen to be based on human 
judgments and on intuitive reasoning and should be justified accordingly. 

QUAEITATIVE ARGUMENT . 

In his paper on qualitative knowing, Campbell (1.974) indicated that 
scientific^ knowing is dependent on common sense ^nd that particular 
facts from either science or common sense are known only within the'body 
of a great many other facts. **The ratio of the doubted to the trusted is 
always a very small fraction." Indeed, the knowledgip of any detail is 
cuntcAt-4epeiident and, accojrding to Campbeil, qualitative knowing of 
"wholes and patterns'* provides the context necessary for interpretitig 
quantitative data. For example, genera^ting alternitTve hypotheses re- 
quires familiarity with the Jocal' setting, a qualitative act. i 

Campbell believes that qucilitative knowing has bewneglccted in favor ) 
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of quantitative methods. At the same time he would prefer to see quali- 
tative and quantitative methods used together to cross-validate one an- 
other. Quantitative methods, he believes, can provide insights that the 
qualitative do not, in spite of the prior grounding of the iatter. Also, since 
all knowing is essentially comparative, he thinks qualitative techniques 
like case^ studies could be improved by experimental design considera- 
tions, which he would.not see as being.a part of quantitative methodology. 
^ -In rethinking the necessity and even the priority of qualitative knowing, 
^ Campbell (1975) has reconsidered the **anecd6tal, single-x-ase, naturalistic 
observation." Quantitative generalisation will contradicUuch knowledge 
at some points but only by trusting a: much larger body ^ such observa- 
tions. In the classic paper on experimental design, -Campbell and Stanley 
(1966), the case study was described as having no basis of caparison 
and hence providing no justification for drawing casual inferences. 

Now Campbell has modified his positiorf considerably, coming to be- 
lieve that the case worker makes many predictions on the basis of his 
tbeorj^ which he can disconform. The process is one of ''pattern-matching'* 
in which aspect's of the pattern ar<^ mcached against okervations of the 
local setti^ng. Campbell sees the .single-shot case stud> as being a more 
secure basis of knowledge than he d\d in ,the past: 

How is it in Canipbeirs view that we can know anything? He traces 
the current epistemological difficulties back to a quest for certainty in 
knowing. The effort to "remove equivocalif by founding knowledge on 
particuli^te sense data and the spirit of logical. atomism point to the same 
search for certainty in particulars'\(Campb.ell, 1966). Certainty was to 
beestablishedby defining *Mncorrigible particulars.' This would result in 
une(quivocally specifiable terms and in a "certainty of communication.". 

CanipBell novv sees this brand of^positivisnf as not being tenable in 
either philosophy or psycliologv. Thing's out of context are not interprc- 
tablc. But how can one still "kho\s " something from a group of events 
which are each in themselves indeterminate?, Campbells answerjs that 
ihts is achieved through "pattern-matching/* aX. 

In event.s of cognition like binocular vision, the eyes recognize common 
objects by a process of triangulation. l^ie more elabx . the pattern the 
more statistically unlikely a mistaken recognition becomes. Through 
memory various patterrj^^inbt compared. Pattern-matchingitself Camp- 
bell sees as a trial and error proCess?<th is isesseniially analogical thinking 
and Campbell sees it as being ubiquitous in the knoWmg process. 

In tact, scientific theory is the.nwst distal forjii 0f knowing, and the 
relationship 1)etween forjpal theory and d^a is one of pattern matching 
with the error ascribed to the measurement of .the data (^*true" scores and 
"estimated** scores) except when it is agreed that/he theory is in need.of 
ovgrhauK There are two patterns to be matched, that of the theory and 
that-of the data. Acceptance or rejection orth^fheoTy Is subject to some 

2 '> / . 
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criterion of fit between the two. Actually a theory is never rejected on the 
basis of its inadcquaty of fit e.\cept when there is an alternative theory tc 
replace it. It is the absence of plausible rival hypotheses .that makes a 
theory '•correct.*' . 

Campbell sees these considerations as directly relevant to program eval- 
uation issues. "1 believe that the problems of equivocality of evidence for 
program effectiveness are so akin to the general problems of scientific in- 
ference that our extrapolations into recommendations about program eval- 
uation procedures can be, with proper mutual criticism, well-grounded.** 
If I understand his position correctly, Campbell is arguing that evalua- 
tion is a part of scientific inquiry and subject to similar epistemological 
concerns. However that may be. in this paper at least. I have reversed the 
ground-figure relationship .somewhat by treafing science as an argument 
aimed at a unjver&al- audience and hence coacerned with establishing 
lonp-te'rm generaKuations, and evaluation as an argument aimed at par- 
ticular audiences dealing \sith context-bound issues. In any case, when 
two of the leading scholars of measurement and experimental design. 
Cronbach and Canipbc::, &trongl> support qualitative studies, that is 
strong endorsement' indeed. 

In evaluation one may think of pattern-matching occurring not only 
in the evaluator s niind a^ he constructs his study and inspects the fit 
between his description of the program and the actual program itself, but 
also in the riiinds of the audiences as they compare the evaluation study 
to their own experience. The A'udicncc ,emselves have images, mem- 
ories, and theories of the program under^cvaluation. In using the evalua- 
tion as a perspective (in this case a verbal model), the audience matches 
Its conception of the program to the evali^ation. Where it attributes the 
error depends on the persuaaiveness'of the evaluation. 

The audiences thus serve as independent points of validation for the 
evaluation and must assume an active role in interpreting the evaluation 
and personal responsibilit} for the interpretation. In some modes of eval- 
uation ihe audience may even be given explicit responsibilit) for approving 
tne final report (see MacDonald's 1974 democratic evaluation in which 
program participants are given veto powei over information about ther - 
selves). . - 

' In Campbell's terms the basic pattern matching process is analogical 
rather than logical (although the process mUst surely involve many forms 
of reasoning^).*ln fact, one can go further tWn this. In an cpistemology 
based on r.empying equivocality and establishing certainty of knowledge 
by defining ••incorrigible partkulars.^^.deductive andlnductive reasoning 
•are the, proper way of relating these particulars, formal logic depends on 
unambiguous terms operating in a closed system. 

To the extent that the terms are ambiguous and the system open (or not 
reducible to isolated subsystems), formal logic can be applied only arga^^ 
mentatively. The reasuning.must include other varieties of thought or one 
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must accept,t}:e fact that one cannot duVd:U>nal analysis. Rational analysis 
is possible in evaliftition but on!> raiciv will it assume s>llogistiL furm. 

AMBIGUITY AND THE DEVELOPMENT OF AN ARGUMENT 

In a sense ainbiguitj L an essential part .if ^ach reasoning process.es. 
Analogical or metaphorical o^e of concept in e\aluaiion will tend to 
render the concepts more obscure. With Campbell. I s\ould agree that 
analogical tWnking is basic to some 'forms of oaluation, and that am 
biguit) is a vital element in communicating ex pent nee. "Naturalistic" 
c\aluatiun,/pr^.cxamplc. depenvU uri being sufncienil> ambiguous io en 
compass past and future cases.- . . * . ' * - 

In fact, some philosopher!s>\\ould find c\en pattern-matLhing of theor> 
and fact as being too positivisiic (Petrie, 1776). this is because obser\a* 
tioiTaJ categories themschcs are believed to be determined b> the theor\. 
Without an independent observational base, there \^ no "objecK\it>" In 
wi.ich to assess 'ihe*theor>. One way around this priiblcm. ai^w^rding to 
Pcirie. is througli metaphorical assertions. ^ ' 

The theor) must prove itself against judgments uf partKulars. Mcu. 
phoricat absertio>*s LaiTbfidge the gap between two separate frames of 
reference b>^'|shiAving" new relationships rather than b> merely dcscrib 
ing them. Just*».s a tcaeha uses metaphors lo link whaJ the siudeni knowh 
to what he does not know, seierttists ean explore new are.|% of interest b\ 
such reaMining. Pctrip points to Kuhn's "exemplars"' as Lor.erete examples 
in scicnve that ha\e wogniti\e funetions prior u specif i*.aiion of criteria 
or rules for which the exemplars are illusiratiwns. Kuhn il970) contended 
that Science is actu.Jl^ transmitted b> those exemplars rather than b> 
idealized rvilcs of procedure. Similarly Petrie sees metaphor p.4>ing an 
essential cognitive role in both sciL..*ific investigation and in learning. 
Thus^ qu !itati\e evahu: mav be rendered in explicit propositions, 
similar to scientific theses but supported b^ qnaiitative data and reason 
inp., or qualif^it' e evaluation mav be manifested in implied examples of 
naturalistic style. 

Conceptuall> the evaluative argument proceeds ' om^ie premises of 
agreement sfiared b\ thi: .udiences and evaluato|^i<u..ard^ the perspective 
the evaluator wants the audiences to have. For ;ach audience there arc 
sets of things that are admitted and an) these is likel> to affect its 
reactions. F'*** exampL. an audience of educational psychologists will 
share knowledge of a set of studies (and exemplars) likcl> to affect tlivir 
judgment of both the educational program and the evaluation. Those 
studies are not shared by classroom teacher^. The teadWrs du» however, 
share direct classroom experiences. 

The evaluator is faced with choosing themes ai)d methods to aoancc 
the argument and appeal lo the audien^.es. To the degree he sees evalua 
thn as part of social science, he will use social science methodology. B> 
selecting some elements and p'rescnting them twliis audiences, fic chooses 
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\\hat is important anJ recant tu the evaluation. He endows these ele 
ments with'^presence." * * * - 

•'Presence acts dirccti); on out sensibility" {Perelman '& Olbrechts- 
Tytcca, 1969)- The elements that arc present to the consciousness assurae 
an importance underestimated b> mure rationalistic conceptions of rca 
^ suning. Tliejev5[tllilt«f ti^ verbally present v^hat he considers 

_^portant, and b^doing.so he enhances its value. This means that tlie 
evaluative argupient is ine\itabl> selective in its presentation and ifthere 
tore al\va>s uOcXt to charges of incompleteness and partialit)! The scope of 
the study can'ije. enlarged but can never be complete in its coverage nor 
complete enough to refute the charge that something has been left out. 

Partiality also exists within the large scope given to interpretation. The 
essential ambiguity of ^hat^th^ngs mean, even in hard data studies, causes 
nu/nerous interpretation problems. Of course, the evalyator ma} choose 
to portray the,ambiguily of;the situation rather than to^lmpose particular 
mterpretalioni. Some British evaluators have pursued this idea mostfully 
by refusing to provide conclusions within their evaluation reports (McDon 
ald,& Walker. 1974, Parlett t^ Hamilton, 1972).^ TJi'ey contend that it is 
the audiences responsibility and privilegeiointetpjct the study since only 
th'jy will know what itjneans for them. The aud;cnces must draw infer 
ences for ibemselves based on their own experiences. The evaluator cannot 
be so presumptive. 

British thtmght has long been known for its affinity for the obscure and 
ambiguous. As Madariaga (1949) has written in comparing the English 
to the French and Spanish'. 'The sense of the complexity of life which 
tends to ma.kc English thought concrete, tends to make it also vague/' 

Of courst\ it must be said that this concrcteness and vagueness of 
thought which respects life's complexities is exercised within a stcohg 
system of traditions and roles inherent in British society. The cautious 
nature .,of public prunounLcmenls and documents, ihctuding evaluations. 
IS often, accompanied by o^tremc personal opinions abo.ut the same events 
,ind personalities. In any case, the British have traditionally I^d thi! way 
in , their appreciation of ambiguity and vagueness. 
An idea is unambiguous only in a formal system in which every unforc 
. seen clement has been extludod or in which the field of application has 
. been determined. One must be able to foresee all future cases. In such a 
formal^system, reasoning by calculation, e.g., in chess, is appropriate. By 
contrast, in law a judge mu^t DlAke decisions that will affect future cases 
he cannot possibly foresee. 

Ambiguous ideas can be clarified by enumerating instances bui the 
ambiguity cannot be eliminated in this way. The context In which the idea 

*Barrv MacDnnald ha> cspm^cd ihi ulltmalc view tn dcktann^ thM Ihc ni.ac fuU> one 
.Mudics a ^iiuatiort. ihe more ambiguous il appeal. If iruc. thu raises qui.stii»ns abo^l 
ihc role «f cviiiuaunn m tlct^ivon making, allhou^jf^ unc mighi wimicna llul cvahiaiion itill 
onlv make dcci>ionv bcticr. not neccisarily easier. 

27 



EVALUATION AS ARGUMENT 21 
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is U5cd.j)riniarii> determines its meaning and a ne\\ context v^ill shift the 
. meaning somewhat. Analogical and metaphurit. thinking appi) termb to 
areas be>und their normal context and in a sense create new unspecified 
meanings. The elasticity of terms and ideas used in an evaluation 'means 
that the ideas themselves ma> develop and be transformed within the 
argument^itseif." 

The premises uill often remain implicit in aii evaluative argument. 
There is not time to make explicit all the agreements on which the dia 
Iwgue depends. All this ambiguit> in choice of premises, selection of data, 
interpretation of meaning, and use otvague notions makes the argument 
nonbinding. * ^ - 

Indeterminacy and unspec»!labilit> are essential parls of evaluation, 
vvhe^her-based oii hard data or soft. This ambiguity nccesbiu tcb personal 
judgments on the part of both evaluator and audiences. It also suggests 
that overall judgment is mor<; important than precise calculation in most 
evraluative reasoning. - ' . 



Chapter II 

THE LOGIC OF THE ARGUMENT 
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MODES OF REASONING 

No doubt there are .circumstances in evaluation where formaj logic 
is applicable. For example, deductive logic is certainly appropriate in 
; determining the internal consistency of mathematical models and indue 
tive logic is indicate^ in problems of statistical inference. Where appro- 
priate, this reasoning should be applied. For the most part, however, 
evaluators must rel> oh extra formal modes of reasoning. I will enumerate 
some of the^e techniques of argument based on Ferelman and Olbrechts 
Tyteca's treatise (1969) on argumentation. The list is by no means ex- 
haustive of man's informal reasoning powers. In the next section I shall 
illustrate the use of these arguments by an analysis of a well-accepted 
evaluation study. 

The techniques of argument presented here are divided into three types, 
cjuasi logical arguments, arguments based on the structure of reality, and 
arguments estahJishing the structure of reality. The first of these types, 
quasi-logical arguments, derive their credibility from their similarity to 
formal logic or mathematical reasoning. However, it Js onlj by a reduction 
that the quasi-logical argument appears to be formal. The argument is 
essentially non-formal rather than formal and must ultimatelv be defended 
by resort to other forms of argument. 

Quasi*iogica! Arguments 

The first of these arguments depend on their similarity to logical rela 
iionships. They include contradiction and incompatibility, identity and 
definition, transitivity, and reciprocity. The other group of q,uasi logical 
arguments depend on their similarity to mathematical reasoning. These 
are inclusion of the part into the whole, division of whoje into parts; 
comparison; and arguments of probability. 

Incompatibility, In a logical system two theses that contradict one 
another show the system is logically inconsistent. The quasi logical ana 
logue is incompatibility In which one is forctd .6 choose between two 
theses that are not logically but are practically incompatible because of 
circumstances. In extreme cases holding incompatible theses may invite 
ridicule, the argumentative equivalent of logical absurdity.. For example, 
in an evaluation the director of the project may present one view of, the 
project 'vhile a teacher working in it may present quite a different view. 
The two viewpoints are not logically contradictory since both may be true 
as viewed from different circumstances. Nonetheless, the incompatibility 
may be an important point in the total evaluation. In fact, the director 

22 



THE LOGIC OF THE ARGUMENT .23 



whose vie\v is incompatible with the views of othets in the project does 
begin to look ridiculous. 

Total identity and definition. Insofar as definitions can ie stated unam- 
biguously and unequivocally, they belong to systems of formal' logic. As 
soon as they are Applied to real world problems,, definitions become 
quasi-logical. One must choose among many possible meanings. Only 
purely conventional systi-ms can escape these identity problems. For 
example, validity is defined In at least five different ways ranging frpm a. 
general justification to the ability to predict one event from another. One 
can employ any one of the definitions but the choice must be defended as 
appropriate and applicable if challenged by someone: 

Partial identitv. The "rule of formal justice" requires that identical 
treatment be given to beings or situations of the same kind. This provides 
for consistency of action, the basis of formal justice. "Reciprocity" of 
behavior rests oa defining situations as symmetrical. These arguments 
require partial reductions, such as in the prestige and status of the parties 
involved, which of course depend on argued positions. For example, "It 
was only fair that the teacher provide s^ecjal assistance tp^the child since 
she had already given extra help to others." More arguable would be 
"They deserved equal grades since they had exerted the same effort, 
although with far different results." Tliese statements rest on definitions 
of partial identities. ' 

Transitivity, A is greater tlian B, and B is greater than C, so therefore 
A is greater than C— but the basis of "greater than" is arguable. For 
example. Program A is better than B because test scores are higher. A 
must be better than C because B's test scores are better than C's. Of 
course, the criteria for comparisons are arguable as is the transitivity of 
the relationship itself. Program A may not be better than C even if the first 
relationship holds. ^ 

The arguments based on similarit> to mathematical reasoning include 
the following:* 

Inclusion of the part in the whole. The whole is greater than each part. 
For example, 'Having a higher total test score is better than a high score 
on one of the parts because the total score includes the parts." 

4 

Divisio. ; oj the whole into the parts. Exhaustive division into parts leads 
to the conclusion that the part le?t is necessary in some way, "I will list my 
biases for the study and a^ainstjt." "Either we have a Type I error or a 
Type II error." 
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Comparison. Direct comparison of objects is based on an idea of mea- 
sure but any standard of measurement is lacking. Criteria are often cited. 
Choice always' implies comparison. "Argument by sacrifice" is a form of 
comparison: what sacrifice would one be willing to make to achieve an 
end? Perhaps all evaluation is basically comparative. 

Probabilities. Argument by probability and variability usually entails 
a reduction of data to monistic and homogenous values and to elements 
by which they can be compared. But it is usually powerfju! because it 
imparts'* an empirical character even \yhen non-quantitative— e.g., De- 
cision Theory, which requires. that the decision situation be reduced to^a 
particular decision model. ^ ' . 

Arguments Based on the.Structure of Reality 

An entirely different class of arguments is based on the "structure of 
reality. " Reality is sufficiently agreed upon and unquestioned, l^ke facts 
and truths, so that one tnes to establish a connection between accepted 
notions and those being promoted. These arguments can be more finely 
, classified as relations of succession, which relate a phenomenon to its 
causes or consequences; and relations of coexistence, which relate an 
"essence" to its manifestations, e.g., a person to his actions. Among the 
sequential relations, in which time plays a major factor, are these: 

Causality. Demonstrating causal links may.be based on many different 
methods and obviously plays an essential role in evaluative argument. The 
attempt to establish a causal link may involve establishing a relationship 
betwee;! two successive events, reasoning from a given event to a presumed 
causq, or projecting a causal consequence as the result of an event. In any 
case the causal statement requires certain value judgpients (see Enriis, 
1972). 

Pragmatism. An event 5s evaluated by its consequences. Value of the 
consequences is transferred to the cause. The value of the consequences 
m^ust be agreed upon or one must resort to other arguments to establish 
their value. 

r 

Ends and means. Determination of the best means depends on exact 
definition and agreement on the end pursued. Only values relating to the 
end are likely to be discussed. In a tech. ologically-oriented society, ends 
means arguments arc particularly potent. Example. Behavioral objectives 
programs are good which achieve these ends. Separating means and ends 
allows maximum agreement by separating the ends and means analyt 
ically, although h is doubtful if a particular means accomplishes only one 
O „ effect. Practically, ends and means are more closely entwined. 
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Waste. Since such an effort has been exerted to this point, it would be 
a waste to give up now. **It^ would be a shame not tog-eanalyze this data 
since it has been so costly to collect/' **bevelop the child's talent to. the 
fullest." 

Direction. If we give iij this time, where will it lead? The domino theory. 
"Knowledge can be indefinitely increased. There is no limit to learning.** 

Unlimited development. More is better and can be obtained. 

Whereas sequential relations are on the same phenomenological level, 
jelations of coexistence connect two objects or events in which one is more 
basic and ejcplanatory of the other. The order of events is of secondary 
importance. These include the following; 

The person and his acts. Our conception of a person is usually influ- 
enced by his actions, though ordinarily the two are not equated as they 
are in behaviorism. Interpreting an everttljy ascribing it to the personality 
is common practice in evaluation studies. How the "intention** of the 
person is handled is particularly critical. The .intent is often inferred by 
correspondence among actions. But there is alwa>s ambiguity. Most at- 
tributions of motivation are examples of this type of argument. 

Authcrity. Although rightfull> excluded from demonstrations in logic, 
^ince the logic must stand on its own, the prestige of the person making 
an assertion is important in argument. It is essential in legal reasoning. 
Only if the assertion is agreed upon by the universal audience and hence 
considered a **fact** is it beyond the reach of authority. 

"Objectivity** is often achieved by separating the person from his act, 
e.g., taking the author*s name off proposals before judging it. However, 
the person may be the best predictor orthe success of the project. Imp,*r- 
tiality may be sought by bias reduction techniques rather than through 
complete severance of the agent from his act (see Scrjven, 1975). In 
argumentation and evaluation the relation between a^person and his 
assertionMs important. 

Person and gro/ip. "He. did that because he*s a behaviorist.** This 
category includes arguments expressing concern in maintaining or estab 
lishing relations with others. Characterizing a person through his group 
meipbership is far more common in evaluation than is realized. Not only 
are quantitative studies set up,ta.reveal differences among groups, quali 
tative evaluations often interpret the social system under study as a set of 
interacting groups. In addition, the^pvaluator is often at pains to demon- 
strate his concern and/or * partiality by showing what groups he himself 
does or does not belong 
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j^cts and essence. What is a good director? A good director is one who 
conforms to the^itleal of a director. In the absence of such conformity 
there is a "deficiency." The essence of an object under evaluation is often 
defined by a set of intuitive criteria one would expect to apply. For ex* 
ample, a **good project director" would be expected to be ana to do certain 
things. The evaiuafgr ma^ elicit this normally implicit set of criteria in 
order to judge the director. The same thing, can be done with a good 
program, a goOd textbook, etc. The list of criteria is never inclusive and 
is always arguable. Nonetheless, the list is often effective in persuading 
the au^ence as tu quality. Example. Consumers Union reports on manu- 
factured products. ' 

« • 
Symbolic relation. Only members of a particular group believe in the 
magical relationship between the symbol and the thing, such as a national 
fkg. Symbolic relationships are important fn describing certain aspects of 
social systems and statuses. These relations are somewhat different in that 
they cannot be justified to othere. Educators often attach such special 
meanings to particular facets of their program and to particular charis- 
matic leaders within it. People and things become the objects of faith in 
and of themselves. This is a common puzxle to the evaluator who may look 
in vain for more material relationships underlying the faith. 

Arguments Establishing the Structure of Reality 

The third class of arguments assumes the fewest premises in advance. 
These arguments rel> neither upon similarit) to formal logic nor argue 
from the already agreed upon structure of reality. Rather they try to estab- 
/«/rrealit>. Example and illustrations do so by resorting to the particular 
case. Analogies and metaphors do so by showing new conceptual relation- 
ships to the audiences. This mode of argument is relied upon heavily in 
\*naturaMstic'* evaluation. 

/Example. .Resort to example implies lack of agreement on a particular 
rule but a prior agreement that one might eventually come to an unier 
standing. A series of examples induces one to generalize. Sometimes the 
reasoning is from the particular to the particular with no rule being stated. 
The examples operate iniplicitly. The technique of the "closed case'' and 
the legal ''precedent" is built on such a technique. This argument values 
•the actual and the habitual. To be effective the example itself must be 
accepted as factual. 

Illustration. Whereas example is used to establish a rule, illustration 
IS used to clarif> unc and strengthen adherence it. It promotes under- 
standing. Illustrations of forms of arguments in this section attempt 
to clarif) the categories but the categories are not dependent on the 
illustrations. 
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Analogy, Analogy ^strikes a relation between two previously unrelated 
spheres and is hence essential iric invention and imagination., It develops 
and extends thought; \^ 

Metaphor. Metaphorical^ assertion opeh^ hew realms- of thought by 
moving from the known to the.unknown an^,by helping indicate th|pgs 
unspecifiable in ordinary language. Metaphoric assertion is most used in 
conjunction with examples and illustrations. How it work^ to extend the 
audience's ideas will be discussed as part of naturalistic evaluation. 

These techniques of argument are not exhaustive and a.e.not intended' 
as a list of techniques from which to construct evaluations. Rather the/ 
are meant to illustrate the.kind of reasoning that is actually employed .h 
^ evaluations. 

ANALTfSlS OB GLASS'S 

^^EDUCATIONAL PRODUCT EVALUATION"* 

I have chosen Glass's "Educational Product Evaluation: A Prototype 
Format Applied" (Glass. 1972) to analyze in terms of the arguments 
eniirtierated in the last section. I selected this evalu{ition for several 
reasons: 

1. It is highly accessible, having a ppeare d in the E^ucatiomil 
. Researcher, ^ i 

2. It is a succinct evaluation. ^ 

3. The authority of the author is unassailable. 

4. It exhibits a variety and complexity gS evaluative arguments. 

5. I find it personally quite persuasive. ' ^ 

My technique will be to paraphrase Glass's work and to identify the 
arguments in parentheses as they occur. I would not contend that I have 
found all the arguments in Glass's work, that the ones I have emphasized 
could not be categorized otherwise, or that the types of argument I have 
enumerated in the Jast section are exhaustive. It would be impossible to 
list all arguments ur.types or to classify them unambiguously. My purpose 
is to illustrate from a very good piece of work that those arguments play 
a critical role in evaluative reasoning. The overall logic of ^he Glass piece 
is somewhat more complex than the arguments I have discussed, and I 
will save it until after a discussion of particujarsv 

Glass begins with a brief introduction stating the tentative nature of 
evaluation techniques and describing \vhat he intends to do. The body of 
the paper^is divided into ten parts. Part I is a description of the AERA. 
cassette recording he intends to evaluate, which \t itself a discussion of 
evaluation by Michael Scriven. 



'Glass's evaluation has been reproduced in the appendix. 
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Part II lists the three goals ot^the product and evialuates them. Training 
evaluators is good since there 'is a need for "evaluation skills because of 
legislation mandating evaluation (cause and effect). Producing a cassette 
that can be used while commuting to work^ may or may xigi be desirable ' 
because it may infringe upon a person's private time in unanticipated 
ways' (pragTnatic argument- valuing an event frpm consequences). Ex- 
perimenting witli new media is commendable if, it is^ not **mere techno- 
logical tinkering" (person and^ his actions— intention .of the actor). The 
evidence will be whether the cassette is properly evaluated (intention 
constructed from consistency of actionsjperson and his actions). 
' Part III describes where things stooa as the evaludor entered. The 
director, the topics of the tape, the lecturer, the subject matter, and. 
the initial copies have already been agreed upon. The vending of the 
cassettes, the choice of materials, and marketing plans are hot settled. 
This signals where it is reasonable for Glass to focus attention. Im- 
plicit is the argument that it would .be a waste of the evaluator's and 
audiences' time and effort to address issues already decided (argument 
of waste). 

Part IV is entitled "trade offs" and is a brilliant turn in the overall 
argument. Glass enumerates what couli be, purchased with the resources 
used to produce the cassette— one day of training session for 100 re- 
searchers, printing of 20,000 copies of prose materials, a half-year 
stipend for a research trainee, or four scholarships to AERA training 
session^ for minority researchers. This is the trade-off for the sponsor, 
the USOE. Trade-offs for the other major audiences— the director, 
AERA, and..the consumer— are also listed. 

Thareasoning begins by asking what would be given up by the cassette 
approach (argument by sacrifice). .It establishes the equivalence of the 
trade-offs in terms of their being purchasable with the resources devbtdd 
to the cassette approach (argument by identity). The trade-offs are afsb 
equivalent in that they are all consistent with the producer's intent. ^ 

Without making explicit the reasons, Glass cho^b^es the typescript alt'er^^ 
native as the trade-off "with the greatest leverage";(argument by compar-"^ 
ison). Why choose the strongest alternative with which to make further 
comparisons? Implicit in the reasoning is the idea that one should choose 
the technique which will best, further the end of thc^ producer (argument 
by ends and means). 

.^Havingphosen the strongest competitor, Glass, in Part V of the study 
expands the cost co.iiparison between the cassette and typeset approaches 
to the fullest (arguments by comj^arison and sacrifice).. In explori! ; co^t 
considerations, he argues tjiat the cost would be worth\vhile for groups of 
10-15; that the. tape is too expensive and coufld be che*.tjer— for this he 
citr the Colorado audio-visual instruction department as authority (argu- 
ment by authority); that typescripts could be better stored; and that the 
» typescript's cost could be further reduced. All these arguments are varia- 
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tiops based on comparisons between the two approaches and what each 
might cost under various contingencies. • ^ * 
Part VI is the "intrinsic" v^valuatibn, labeled secondary by the evaluator. 
It is an evaluation of the technical quality, content, and "utilization of 
uniqueness" of the medium. This series of arguments deals with issues 
that are seconddryjo the entire cassette versus typescript comparison but 
which might be important to a potential consumet^who wishes to purchase 
the cassettes. 

The evaluation of the technical quality and^contep^ are based on an 
ideal of what the technical .quality and content sfiould be— deviations , 
from these ideals are deficiencies (argument by^act and essence). The,, 
e^aiiiator lists criteria which he considers to be relevant and commonly 
agreed upon, since he does uot attempt to justify them Technical quality 
contains tape quality, recording fidelityr aesthetic quality, editing, atjd 
packaging. Each criterion is accompanied by a judgment artd a f^w 
remarks enumerating observations on which the judgment is based. Simi- 
lar "a.ppsteriori" criteria are applied to the content. // 

The Second part ^pf the intrinsic evaluation is of the "utilization of 
uniqueness" of the cassette medium. This is again basically an argurtient 
based on the "essence" of the cassette (act and essence). Two producer 
claims are explored. The fact that .one can stop the tape advantageously, 
is refuted by the evaluator. by counting the number of stops. The^^econd 
claim that a significant number of people have cassette* players, a^d time 
in wnich^to listen to the cassejfes is.confirrned by a nlail survey to 100 
AERA members (argument by probability). Knowing heJs addressing an 
audience of educational researchers. Glass reports the confidence intervals 
in a footnote. Thrviughout the second part the dormant comparison with 
typescript is utilizeo by refuting producer claim-^ that reading typescript 
cannot ^o the same things. Glass argues against the producers' "unique 
features";claim for the cassette approach (argument by act jihd essence). 

Part Vlils the "outcome" evaluation and is labeled as primary by the 
evaluator. The comparison bjstween cassette and typescript is head-on 
in terms of outcomes. He argues that even if, the aural'; medium- is as 
effective in transmitting information, it is slower. Thjs i^, a •comparison 
implying measurement. It is a comparison based on prismatic conse- 
quences (argument by comparison; pragmatic argument). A^ccess is also, 
much slower on the cassette (argument by sacrifice). ^ 

Glas^ cites a review of experimental studies?compat;ing the aural versus 
visual mode as being inconclusive because relative efficiency depends on 
several contingencies. TJjs is non-contributurv to his argument, other 
than increasing the evaluator'^ credibility, but it allows Glass to describe 
a particular study in detail which shows the superiority of visual learning 
(argument by illustration). 

Part VIII is a summary of conclusions and a separate set of recommen- 
dations for.each separate major audience. The recommendations are quite 

. ^ 
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i\tt50a^\plicit. and' specific in theiV directiuu. In fact, the recumwcnda- 
tmns c's^abhsh a hiei aixlu uf actiuns each audience might take, dependi.ig 
liVcontnigeiicics. Part IX lists tjie special audiences who might benefit 
froni'the cassette appriKu.h. Thejargumcnts arc that cassettes niaj b'^ 
beneficiaJ to sigh.tless learners, large groups* ''Rexersc Luxidites." All 
these arguments in Parts VIII and IX are variations of casts and benefit^ 
(argunicnts'of ends and means; pragmatic arguments). , ^ 

Part X is nnuSiTaLin its reflexi\eness. It is entitled "Evaluating^ the 
Evaluaior" and explores tlie cxaluator's own biases. Of course, siyiph 
undcrtiikini; such a consideration enhances the evaluator's credibility. 
Glass* pomb^^out that exaluations themselves involve costs, ^especiall) in 
dcstro\ing a sense of Lommui>it\ (arguments 6T person and group). In this 
case. iKMinderiook the stud) because he was asked^ the product devel- • 
opci (person and act). He establishes his ciedibilitv b> shoNvii:g that h?. 
took actions which are inimiwul to his own inlciests, thus.giving evidence 
of his impartiality. 

Glass di\idcs his moti\es nuo the exclusive categories of niotUes for a 
f*i>orablcve\aiuation and motives for an unfavorable evaluation (argument 
by division of whole into parts). Biases for a favorable review derive from 
the fact thai Glass is a member of the AERA Executive Board, the bene- 
factofN. anc\ the fact that the producers are his close colleagues {»\rgumcnl 
of the person and his group). ^ 

Motivl\s for tfie unfavorable review are that he declined to participate 
himself i>n the grounds the cassette approach is not cost effee'live and, the 
fact that he was once beaten in table tennis b} the project director. Thc^e 
argmnents depond on the construct of the person behind the acts (argu 
mcnt from person and acts). He concludes the evaluation b} pointing out 
that he has collected no data on attitudes toward the prodi^cl or on its 
'effectiveness. He leaU\s the audiences to draw their own conclusions on 
the balance of biases and overall erwiibility. 

Theoiw/// structure of the stud> lis well worth examining. It consists 
of a complex fotin of aigument called the **double hierarchv" argument 
(Pcrelman Olbrethts Tvteta. I%9). The dooble hierarch) argume4i\ 
consists of two hiei archies of values or objects which are usuall) connc^cted 
by relations from the structure of realit). For ex^imptci^jLeibni/' statenient 
that "since |God| tares for the sparrows, he will not neglect rbasonable 
creatures vvli^) are far dearer tojiinf" is based on implicit hierarchies of 
creatures and God's caring audV'onnected b) implied cause ahd 'effect. 
Doiible-liierarchv arguments often take \ he forms of ;Mf . . . then" con 
ditii^iial statements and are nsuaUy implibit'. ' ' • 

I he overall logical structure of Glass's evaluatioJi sepnis to consist of 
a doiibldwerauhv argument. One hierarchv is a hierarchv of costs. The 
other hierarchv is one of benefits. The two Kierarchies are. connected In 
a meafis-eiuls relatioaship. In fact.^thc entire study '5 based on cstabHsh- 
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ing this logical structure and orchestrating the subarguments \y\iWu\ the 
grand overalj design. 

J?or example, qfter the context of the study is defined by the' product 
description, the producers* ^oals, and the entry point of the evaluator, 
Gla§s builds a hierarchy of trade-offs in Part IV. in Pjirt V he selects the 
strongest competitor and builds the cost comparison hierarch) between 
the two approliches. In Part VII he builds the benefits hierarchy, again 
based on comparisons between^ the two approaches. The means-ends 
relation connects the two hierarchies. It demands that the best nicans be 
chosen to accomplish given ends. The contingencies in Parts Vlll and IX 
arc explorations of \\}i#^\\ou]d happen^if one moved up or down the cost 
hierarchy or thc-bencfits hierarch'y. 

Thus Glass has conducted a cost-benefit analysis without precise mea- 
surement of the 5:osts or the benefits. And it is persuasive. It ,c compel- 
ling, I think, because of the integration of the arguments. All the argu- 
ments work cconomicall} within the overall structure. There is very little 
extraneous movement. Only the introduction and the final section on the 
credibilit} of the evaluator do not contribute dir. lo the overall argu-. 
nicntative line. \e>thetiLally tljese two sections arc appropriatcl} placed 
at the beginning ah*d Qi\6. One is inclined to agree with Polanji that the 
ultimate test of truth is the coherence aird beautj of the structure. 

The most difficult part to handle in the overall design is Part VI, dealing 
with the qualit} of the cassette. Glass was actually asked to e\aluate the 

^fassette itself. 1 would surmise that the*b(i^L problem of intellectual 
i'ncompatibilit} from whith the evaluation grew was that the cassette itself, 
was good biU Glass did liot see the investment. as being worthwhile.. He 

'redefined the problem such that he was evaluating the cassette approach 
rather than just the cassette. itself. Yqt he ^.ould hardly evaluate the 
product without direct e\aluation of the tape. Also, one of nis audiences 
had to be potential co;isumers who might bu} the tape and not just AERA 

✓board members who wanted to know if.the entire **v.»iit> was worthwhile. 
He labeled 'the cassette e\ a illation secondar} as opposed to the primary 
outcome evaluation. Aesthetically he aisp de emphasized it b> tucking 
the intrinsic e\aluation into the'middle of the o\erall presentation. 

^ In addition to the logical coherency of the e\aluation. it is also persua- 
sivc because the premises\}f agreement are well chosen for the audiences. 
Costs/bencfits are powerful \alues for the audiences and means-ends 
relations are ncarl> unquestioned b> people versed in the teleolog} of 
utilitarun ethics. Glass takes the audiences from values they agree with 
to conclusfons the> ma> not have accepted initia11>. He js kecnl> aware 
of who his audiences uic, even addressing each directly-and givjng each 
different recommendations. One may suspect, ho\^ever. that hi^s argu- 
ments are not equally persuasive to all. Some groups are Jikel> to' harbor 
values and conditions untouclKd bv the evaluation. Yet he has solved the 
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problem »f compasilc audicnLCs with differing demands bcautifulK, both 
logically and aesthetically. 

How uould one den> such (in e\aliiation? One could attacTv the basic 
argunientati\e MHiLturc bv den>ing the equi\alencc of the trade offs and 
by questioning the .selection of the strongest competitor, thus denying 
the means-end relationship. One could attack the costs and deny the 
'comparati\e benefits that result from thc.t>pescript approach. Attacking 
the .secondary evaluation of the tape quality itself does little good since it 
IS nut integral to the o\crall logic of the stud>. Glass can concede points 
there and still arrive at negative conclusions. One can also claim the 
cvaluatur is undul> biased and attack the credibilit} of the stud> in that 
\\a>, although Glass'^^ discussion of his own biases m^kes it more difficult 
to do. Ah> evaluation is assailable, even one that is highfy persuasive. 
. It IS noieworthv that in this masterful evaluation. Glass has used most 
of the t>pes of argument previously enumerated. He relies heavily on 
argumeiits from the 'structure of reality, especially sequential relation 
ships linking phenomena to consequenpes ,such as ends and n::ans argu 
njcnts, and on quasi-logical arguments such. as comparisons. He has very 
few arguments which attempt to establish the structure of reality such as 
examples and metaphors. 

Formal data cullecfion procedures are used only moderately, and where 
employed do not contri!>ute critically to the import of the evaluation. Most 
data consist of already iccepted ''facts.'* Formal data collection pro- 
cedures arb not essential tc evaluation; argumentation is. 

ANALYSIS pF SCRIVEN'r RESPONSE TO 
GLASS'S EVALUATION* 

This section w^is written five months after the rest of the paper be- 
cause I did not know^ of^eriven':. res|)onse to Glass's e\aluation until 
informed of it b> Glass. Tiie timirtg is. important because Scrive^ ai* 
tacked* Class's cv.>luation in precisely the way it was suggested in the 
previous section one wuuld have to do. One must deny the equivalence 
of the irade-offs aVd question the selcLtioi of the strongej>t competitor, 
thus denying the means-end relationship, as veil as attach the costs and 
deny the ctmiparalivc. benefits of the typescript alternative. This is what 
Scri\en docs,, and comparing his reasoning to Gla.si's is interesting. 

Scriven (May/^1972) begins by saying he liiis been invited 1o respond 
to the Glass cvfllualiftn of his ^.a^sctlc (intentions of the actor -argument 
relating a person and hls^acts). He sketches the background cimditions 
surrounding his dcLision to redo the entjrc second cycle rather than revise 
ihc twhi produLT. TJie argument is laid out lationally as a choi:e among 
three alternatives (pragmatic argument). However. Scriven devotes^ so 

much space tl) devjcfoping tfic context of his action that he clearly wants 

» 

*Scri\cn\ fcvponsc lias'bccn i^pcodiianl in llic appendix. 
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his audiences to understand hls'*m\itha^uins (argument relating a pcrM^n 
to his acts). ... 

Scrivcn also);as a much larger probfcm witli bias thaft does Glass be- 
cause Scriven is* responding to an evaluation ol his o^vn product and is 
inimediatciv suspect, tntercsiinglv hexirguesthat the dtrceiion of bi.is is 
So obvious that it can do no harm (pragmatic argument). i Kis argument ^ 
attemplV to jCfcxT the relation bct\\een the act of counter argument and 
the motivation (bias) of the acjtor (argument relating a person and his 
acts). It is an attempt to reduce perceived bias on the pan of the actor. 
Scriben b^^tt^esses h»s imparlialit> b\ showing his abiliiv to diMingmsh 
bet%vcen ''cxchses" and ''criticisms" (in itself an argument b\ division of 
whole inUJ park). While there arc .several kinds f argunicnis in the Tr*:! 
two sections of hii^ response. Scriven organizes thc^rUAvardsdisa^sociaimg 
himself froni bias- 
In the third section 5cri\en Uirns to a eonsideration of the c-onclusions 
about the hardware. He accepts most of GL^s's criticisni.. vdhanclng his 
credibilitv) but dismisses the dcsi'rabilii) of ci,cap ^.apcs because ihcv wiil 
wear badl> (pragmatic argumcnp because of heating and fricium effects 
(cause and effect). He ai o dismisses-distortion in the vassct^cs if ihe> 
are played on the proper » q'^p., . iTie mention of Advent and Mac- 
Intosh cquipmei>l im^ied.^.cl> captures the audiophilcs m the audiencv 
and sliows that Scrivcn knows what he is talking about , (argument bv 
authoritv). Now it is clear whv^hc started with an anahsis of the relatively 
unimporiaul area of hardware, Sci iven has better informaiion m this area 
than does Glass. It is also' an attack on Glass's cost a1\al)Ms in terms of 
the size of the audience reachable and the cost of the tapes. 

There is little to argue' aboiit in software mhcc GlassN evaluation of 
Scriven's tape is a string of "excellences.** Scrivcn dismisses the criticism 
that lack of citations is a handicap, base'd on the feedback he has recer ed 
from the field (argument by probability). 

Then conies Scriven's basic attack on the U>giv of Class's evaluation. 
Scriven cuncedv*s thai ' the general procedure of reallj working ti» get 
cMimares of Cumparati*«^ost effectiveness ^^ecms to me absuluielv correct 
and I ideed the meth^»?H)f choice in all educational evaluation/' But he 
is r d in agreement wifllClass's assessments ofjhe costs and bcfnefits and 
p «rticularl> the wav Gf»tss has, them linked togc-*,er. Sttivcn's bas^t^ihuist 
is that Glassjias chosen the wrong competitor iihc tvpescripi) for com- 
parison. 

Scriven contends the cassette servci different ends than docs thc^lvpe- 
script, it is more useful than listening ti» a var tadio and it can be a cficap 
surrogate for a visiting lecturer in a course, (Thes. wo exclusive purpll^es 
are established b> definhion.) These arguments denv the equivalence i t , 
outcomes that Cla^s hiis establishqd (th^ argument bv idt;ntit>). The 
caikicttcs accomplish different ends and thercfi»rc the trade nffs are not 
equivalent. The cassette is a ini»tivator in places where written material 
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is,nonpragniatic argument), Ako the costs are the same as for commer 
cial tapes (comparison witlr the norm). \ 

Scri'.en admits cost, speed, and replay advantages for the :>peWript. 
but again iHc cassette i;itroducjps a new dement the written materitil does 
not. Scnve,n gives several reason^for using the cassetje in class, hqaring 
the authorit} himself, several speakers are better than^one, and thd tape 
provides vanet> (arguments of pragmatism, the whole greater thin its 
parts, and unlimited development). While not gqaerall) superior, ^it is 
"repertoire-enfargmg.'^ Notice that overall, Scrivcn is arguing for. the 
uniqueness 6i the cassette while Glass is arguing that the tvpesqript 
accQjm^lishes more of the common goats {loci of 'qualit> versus loci of 
quantity). ' ^ ' \ ' 

-.Jhe cost irade-qffs Scriven treats as problematic. Perhaps the funics 

Myr producing the Ctissettes were not available fur anything else u^er the; 
circumstances (argument of waste?). Even if the} were, AERA should be 
domg experimental things (act and essence -"being e.xperimentar* an 
impheil criterion for AERA),' and this\is a reasonable experiment, given 
other attempts (argumenj b> comparison with the norm). Also it is better 
to trv it in education/if it is to be used in education (partial identitj?). 

-But Scriven's main objection to Glass's oaluajion is the object of com 
parisun. So mv principal critiCism of the Gla^s evaluation concerns the 
choice of the mam crucial compari i It should not have been the typp 

. script but just the* better content— cheaper package cassene Tlie dis 
*igreemeni is not merely one of comparison. The disagreement is whether 
to connect the costs and benefits by a means ends argument, which su^, 
gests the best competitor*- the typescript— or b> a pragmatic argument, 
which suggests a /wer competitor. 

Scnven insists on the uniqueness of the ,.iedium. Although Glass has 
refuted the uniqueness argument by counting the number of times Scrivcn 
" stopped the tape, Saiven argues he is not persuaded because Glass did 
not gffer what would be a imique utili/.atio/i. Scrivcn switches to "com 

^ prehcnsibilitv" as the uniqueness factor, admitting tharthe. number of 
Slops on the casscTtts^is .a poor indicatorjjfjbM eiiTcrionlargument by 
act and essence). 




jn the last section of his response. Scriven suggests that Glass:s '^Reverse 
Luddites" categorv <if potential audiences is tim narrowt> conceived and 
that t^herc are manv normal people who wtmid benefit from a cassette 
because there arc people who prefer listening to reading (argumerits by 
frequeilcv). In fact, e^^erjone dues so at some time of the day (cause and 
effect). These arguments are supported b) Scriven asking his wife (argu 
ment by iHustratum) ju^t, as Glass used a stud> by one of his graduate 
students. Finally Scnven says one must also consider the additional bene 
fas of what has been learned bv the intermediar) population - himself and 
the p.-oduccr (pragmatic argument). All these arguments increase the 
benefits, thus making the cost benefit ratio more acceptable. 
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The overall logic of Glass's original evaluation is a double hierarchy 
.argument of costs and benefits.linked b> a means-ends relationship. Scri- 
ven sees this structure clearl} and accepts the basic comparison of costs 
and benefits as the method of choice /or all evaluations. He tries to shou 
how the costs arc not extravagant and unreasonable and that the benefits 
of the cassette are significantly underrated by Glass. ^ut the main criti- 
cism is (o challenge Glass's means ends argument bv substituting a prag- 
matic argument as tliQ link. The means-euds argument requires ?hat the 
cassette be compared to the best alternative available. ScrNen s pragmatic 
argument requires only that the cassette be better than what now exists 
among other cassettes. Striven \s strategy is. to claim unique features for 
the cassette so it does not have to compete totally head-to-head witJSt^^i-/ 
typescript approach on (jach dimension. Scrtvcn is arguing fur ii qualita- 
tively different field of comparison. 

The pragmatic argument in its elemental furm consists of e^valuating an 
event in terms of its consequences, the means-ends. argument, on i. 
other hand, depends on agreement on\he ends. Determining the bcck* 
.means tO:the ends depends on exau definition of the ends pursued. Vaiucs 
hot related to the ends arc eliniinated from consideration, y the ends are 
exactly defined and agreed upon, the determination of the Jbest means 
becomes a technical problem. Sueh reasoning, appropriate fur the tech- 
nical disciplines, is quite different from every day reasoning. 

Generally speaking Glass's work as a whole has traded to be more 
means ends and more leLhnically oriented while Scrlven's has tended to 
rely more on pragmatic argument. In fact, Scriven's goal-free evaluation 
might be regarded as an ultimate expression uf pra|maliL argument. One 
does not care about the expressed ends at all but only about the co.nse- 
quences of the object under cvaluatiua. Generally, cwHceiving an evalua- 
tive problem in "means ends" logic lends to devalue the means in.relation 
to the enc's, while conceiving the same problem in^-'cvent-con^equences" 
logic tends to make the event relatively more important. Scriven's chal- 
lenge to Glass culminates eventually .n a dIsuu6sion^over the ends of tiie 
cassette approach. ' > 

On a more abstract level the dispute is between two priiieiplesof rational 
choice, the principle of effective means and the principle of inctusivertess 
(Rawls, lO'^l), The prineiplc rff effective means stipulates that, given the 
objective, one is to achieve it with the least expenditure of means or, given ^ 
the means, one.is to fulfitl the objective to the fullest possible extent. In 
other words one is to adopt the best alternatives. 

The principle of inclusivehcss stipulates that one ^alternative plan is to 
be prel^crred lo the v her if Jt would accomplish all the aims of the other 
plan plus some additional aims. In arguing for the cassette approach as 
**repcrtoife'* e.^panding but not as a total substitute for the typescript, 
Scrivcn is-so arguirfg. ^ ' , 1 

The few differences between Glass and Scriven should nut &sCure the 
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man> similarities of, their evaluative argument. Both accept comparison of 
costs and benefits as the method of choice. Both rely heavily on "structure 
of reality" arguments, Glass relying a little more on relations of coexis 
tence, e.g.. the relations between a person and his acts and between a 
person and his group. Scriven relies slightly more on sequential relations 
arguments, especially pragmatic argument. In spite of structure of reality 
arguments, there is little surveying of others for infoririation. Both rely 
on their own personal observations for primary data. 

Secondarily, both use quasi-logical arguments, though only about half 
as often^as the above arguments. Both use arguments attempting to estab 
lish the structure of reality, e.g.. examples, analogies, etc., only once. An 
entirely iifferent type of evaluation would have been to put the cassettes 
into use in the field and to collect anecdotes about how they are used. This 
type of evaluation will be discussed in the next section as "naturalistic" 
evaluation. . - , 

Both Glass and Scriven use more than twenty five arguments in their 
articles, although Scriven 's article is half as long as Glass's. Scriven's high 
argument densit> reflects his general style, he is apt to spin out a number 
of reasons for a given judgment one after the other in a profuse and linear 
.fashion. Here and e)' whexe. Glass offers fewer reasons but they are more 
carefully articulated with one another, some arguments carefully nested 
within others. 

Partly^ because of this. Glass's piece is more coherent and aesthetically 
pleasing than is Scriven's. Scriven is at the disadvantage of having to 
respond to Glass's paper rather than creating a full-fledged argument 
form of his own. as he did, for example, in his goal free evaluation paper 
.(Scnvyn, 1973). The somewhat rambling flow of Scriven's response as he 
answers various points in Glass's paper detracts from the overall persua 
siveness of his -arguments. It is a serious disadvantage that every resp^on 
dent to a document must'face. 

Finally ii should be noted that this exchange between two of the fore 
most evaluation theorists is not primarily over data. Rather the dispute is 
over the proper comparison for the object under evaluation, which is 
eventudTly traceable to the argument form preferred and the audiences 
addressed. Some people think that all disputes can be resolved by data 
but such is not the case. 1\ is often the logic of the evaluation that is in 
dispute./ 

NATURALISTIC EVALUATION 

When one reads a novel or pucni. something is learned. If someone were 
to ask what has been learned* it would be difficult to sa>. Often the knowl 
edge gained from such reading is not in p'ropositional form. Yet in the 
readmg of such works, experience from the novel or poem is mapped onto 
the mind of the reader. The kinds of generalisations the reader acquires 
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have been called *naturalistic" (Stake, .1976) or *fspontaneous" (Perelman 
Olbrechts-Tyteca^ 1969). 

The class of arguments that try to establish a structure of reality and 
assume the least agreement in advance between ^the author ^d audience: 
are those most used in **naturalistic" evaluation. They include example, 
illustration, analogy, and metaphor. I would label as ^^naturalistic** an 
e\aluation which attempts to arrive at naturalistic generalizations on th^ 
part of the audience, which is aimed at non-techjuical audiences like teacji 
ers or the public at large; which uses ordinary language; which is based 
on informal everyday reasoning, and which^makes extensive use of^rgu 
ments attempting to establish the structure of reality. In this category I 
would include most case study evaluation^(Stake, 1976; Smith &,Pohland, 
1974; Parlett.& Hamilton, 1972; and |^IacDonald & Walker^ 1974) and 
also those employing legal procedures (Levine, 1973, Owens, 1973, and 
WolfJ974). / » / ' ^ 

Denxin (1971) described the naturalistic approach ii} sociology. It at- 
tempts to, blend the "covert, private features of the, social act with its 
public, beha\iv>ri7 > observable counterparts. It thus works back and forth 
between word and deed, definition and act.*' The^oserver is a part of the 
research act and reflections on the self may Joe, important data. The re- 
search begins with troubling issues andadmits^^jjy and all relevant ethical 
data. • / 

The focus is on the complexity of everyday life, and naturalism tries to 
understand the everyday world in the ex^perience of those who live it. The 
naturalist shows profound respect foj the empirical world. Participants 
serve as constant sources of ideas and as checks on the developing ideas 
uf the naturalist. Multiple perspectives are essential to portra> the whole 
picture. The naturalist carries on and perhaps records covert dialogues 
with himself as he tries to explain events. 

Since the focus is on understanding various interactions, the naturalist 
must follow events over time. He searches for explanations, rather than 
predictions, and explanations must usually be grounded in the retrospec 
tive reasons people give for their own and others* behavior. This necessi 
tates considerable submersion in the participants* culture and languag\ 
Joint actions are major points of attention, and they have to be seen in 
some historical/perspeetive. 

Validity i^ provided by cross checking different data sources and by 
testing perception^ against those of participants. Issues and questions 
arise from the people and situations being studied rather th/in from the 
investigator*s pfCiJonceptions. Concepts and indicators "derive from the 
subject's world of meaning and action.*' In constructing explanations, the 
n^ituralist looks for convergence of his data sources and develops sequen 
tial, phase like e.xplanations that assume no event has single causes. 
Working backwards from an important event is a common procedure. 
Intcospection js a common source of data. " 
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Of course, the sociufogiit is interested jn constructing a gencralizable 
theor>. The naturalistic evaluator is interested onl) in the case he is eval- 
uating. The sociologist will tr> to justif) his conclusions tp a universal 
^ audience. The naturalistic evaluator must adjust his work t9 a particular 
audience^ who nia> even be the participants yf the program he is evaluat- 
ing. In presenting their studies both will rel} heavil> on, examples a\id 
illustrations drawn fiom the field. The evaluator n]a> or mny not draw 
specific conclusions from the examples. If the examples are collected and 
presented s>stematiLall). their logic will resemble that of inductive reason 
ing. However, in naturalistic evaluation the audience always has the 
choice of how to interpret the findings and of how much credibility to 
assign them. , ^ 

Evaluations using example^ and illustrations extensivel). even evalua'^ 
tions which consist entirel> of one. extended^ example, are becoming 
commonplace. The> arc particularly important when appealing to non 
technical audienvcs v\ ho arc not familiar w ith mure arcane forms of quan 
titati\e argument and to audiences for vvhoniM;^ evaluator can make few 
assumptions abmit the premises of agret^nenY, School practitioners fall 
into both these categories. It is dangerous to presume that practitioners 
start from tlic same values and see reajit> the same^^ay as evaluators or 
government officials. * ^ • ' 

Analogies and metaphors are seldom used in esaluation, because they 
are often perceived as mere figures of i>peech and thus unreliable data. 
The> are, however, important wa>s of a^-riving at natutal^sllc-generaliza- 
tiun^. Petrie (1976) sUfc;gested that Kuhn's exemplars convey cognitive 
, categories essential for an initiate to understand scientific theories. Orton> 
(1975a, 1975b) discussed the ways in which metaphors work to extend 
thought. 

Orton> contends that wurds do not precisely convey the flow of expe 
rience as it is presented tu the human mind. Expeiience is continuous and 
non discrete, and even though words do not have distinct meanings like 
logical symbol systems, neither do they accurately represent all forms of 
experience. By ''partuulan/.ation** metaphors help bridge the gap between 
language and experience. Particularization conveys mental images to the 
mind of the reader! A term like "fearless warrior" evokes meaning more 
succinctly and compactly than does a longer description. In addition 
metaphors ca.n capture tlistmctions that are otherwise inexpressible. 

According to Ortony. another characteristic of metaphors is their vivid- 
^ness. They are closer to experience and convey emotional as well as cogni- 
tive and sensory meanings. This imagibility is associated v\ith learnability. 
""Metaphors* fai.ilitatc insight aiWl personal understanding by moving from 
the known to the less known.Trhey facilitate naturalistic generali/.atlon on 
the part of the audiences. It is critical, however, that the author under- 
stand his audiences in order to know whether a metaphoric assertion will 
Qxpand understanding or simply pass the audiences by. 
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Orton> also extends thL conception of language into Xhn teaching-learn 
ing situation. Drawing upon Polan>i*s ideaof tacit knowledg^, he contends 
that the teacher must always know much more than he can express in 
prepositional form. It'i^.this taci.t knowledge, partiall) a knowledge of 
contextual application, that is the deep understanding of a field or disci; 
pline. In order to communicate knowledge to a student, the teacher must 
select from his tacit knowledge and tr) to represent it in piopositional 
t9rms. The propositional form is always somewhat remo' *J from the full 
tacit understanding. , 

The student initially sees onl> the propositions. It is like learning to ride 
a-bicycle by reading a set of instructions. The beginner's behavior is con- 
trolled b> th.e explicit propositional knowledge which is inadequate. It 
is here that the teacher can aid the student b) examples, metaphors, and 
non-literal language. 

Scientists tr>ing lo learn their discipfinc have similar problems. Accord- 
ing to prominent critics, it would be impossible to learn a scientific dis- 
ciplinii by following a set of rules (Polan>i, 1958, Kuhfi., 1970). According 
to Kuhn, X scientist learns his discipline through a set of exemplars- 
concrete problems permitting solutions that enable the novice to make 
comparisons with other disparate problems. The shared meaning is trans 
ferred through these experiences and not through rulcJ>. 

The similarit) between naturalistic generali/atlons in evaluation through 
the use of examples and metaphors and other arguments which attempt 
to establish a structure of realit> is clear. Understanding^and. insight on 
the part of the audience is facilitated even though there ma> be no scien- 
tificall) verified propositions in the Mise of formal logic. Even though its 
cpistemologieal and ps>chologiLal assumptions are somewhat different 
from other types of evaluation, naturalistic evaluation is still a form of 
argumentation. 

ORIECTIVITY, VALIDITY, AND . 
IMPARTIALITX RECONSIDERED 

What does it mean to say that an evaluation stud> is ''objective** or 
*\alid?" Few concepts have been so confused and have caused so much 
mischief in educational inquir>. Man> people are reluctant to accept oi 
believe qualitative evaluations simpl> because the> are based on onl>.one 
person's observations. Observations l» one person are considered in and 
of themselves to be subjective and hence illegitimate for public purposes. 

The crux of the confusion lies in misconcei\ing **objectivit>,*/ Scriven 
(19"'2) has written cogently^and brilliantl) about this confusion, tracing 
the unfortunate history of hovv objectivit) has been defined. The theme of 
most definitions of objcctivit) is that thpre is something outside the mind 
that is verifiable through public or intersubjcctive agreement and that one 
can express or prove such things without infiuence from personal feelings. 
An evaluation whjch can do so is objective. But can one person*s view ever 
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be *Vojective?'* The difficultv lies in confusing obje*ctivit> \\ith procedures 

. for determining intersubjectivity. - , 

Scriyen (1972) contends that there are two different senses .in which 
ubjectivitv is used-the quantitative and the qualitative. In the quantitar 
tive sense of the term, one person's opinion about something is regarded 
as being subjective— the disposition of one individual. Objectivity is 

, achieved thrpugh the experiences of a number of subjects or observers. 
The common exptnencing makes the observation public through inter 
subjective agreeniont. More formally, one might say that with a number 
of individuals one is more certain that one has properl> represented the 
populationr-a sampling.problem. 

The qualitative sense* of objectivity ^is quite different. It refers to the 
quality of Jhe observation regardless of the number of people making it. 
Being objective means that the observation is factual, while being subjec 
tive Ineans that the observation is biased in some way. Is it possible for 
one person 'jj observations to be factual while a number of people's obser- 
vations are not? Indeed it is. So an observation can be quanlUatively 
^subjective (one man's opinion) and also qualitatively objective (actually 
unbiased and true). 

- In fact) one might contend that the types of biases44}at affect tlie opin- 
ion'of one person are somewhat different from those biases thatj^lague 
group opinions. For example, an individual ma> succumb more ea>il> to 
idiosyncratic viewpoints since he can hold only one perspective. On the 
other hand, there are social and cultural biases to which a group isi^mbr^ 
susceptible than is a particular person, e.g.. jingoism. Tiic individual's 
qualitative objectivity o«n be assessed by his previous track record on such 
matters and by his current self interests. In any case, one who subscribes 
entirely to the quantitative notion of objectivity is not going to be satisfied 
with apprpaches like case studies. 

How did the quantitative notion equating the number of people making 
an observation with its truth gain suchaiscendency, even tp the point of 
excluding qualitative objectivity? Scriven traces this distortion to psychol 
ogy's attempt to root out introspectionism and philosophy's attempt to 
purge obscure metaphysics. Both tried to do so through the verification 
principle, Intersubjectivity becamcooperationalized as the crherion for 
objectivity. In its extreme form the equating of objectivity with the quan 
titative notion of intersubjectivity was manifested in methddologicaj be 
haviorism and in operatiunalism. .But .the fallacy* of intersubjectivism 
pervades all fields. 

Scriven cites the example of an evaluiition of a television antenna in an 
elcwtronics magazine in which the evaluator can see and report a better 
picture resulting from'one of the tested antennas. Yet the evaluator apolo 
gizes for being "subjective** in his approach since he did not use an 
instrument to measur^^ decibel gain. In fact, as Scriven notes, it is possible 
O to get intersubjective agreement without instruments on the pc-formance 
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of electronic equipment and it is the case that these pooled judgments of 
quality do not correlate highly with any instrument readings. Why then 
is an instrument reading objective while one person's judgment is subjec- 
tive in the perception of this confused evaluator? 

The reason is that the evaluatgr is only one person making the obser- 
" vation, and even though he know^he could have his observation confirmed 
by calling in his colleagues/1ie believes an instrument would be better 
because he can get even higher agreement among, observers on the meter 
reading itself— even though the meter reading is not highly indicative of 
quality. In this case the quantitative notion of intersubjectivity has sup- 
planted the quality of the perception, 

In operational terms ''measuring on a quantitative scale by mechanical 
means'* becomes the indicator of truth b6cause the interjudge reliability 
IS higher, according to Scriyen. Simultaneously one has actually sacrificed 
vaiyity for reliability because the meter reading, while reliable, is not a 
gocd indicator of picture quality. This is one of the con.imon errors of 
evaluation— the substitution of instruments for direct observation.of qual- 
ity, the substitution of reliability for validity, And it is an error of the first 
\ magnitude. , . * ^ ' 

From this idea— that what cannot be directly, experienced by others 
cannot be taken seriously by science (intersubjectiyism)— has developed 
the concept of objectivity as the externaliiiation of all references so that 
multiple witnessing can be achieved, a gross oversimplification accor^iing 
to Scriven. In educational inquiry, this has been manifested in equating 
objectivity with the ability to specify and explicate most completely all 
data collection procedures. Complete externali/.ation and objectification 
permit replication, the hallmark of reliability. In education being objec- 
tive has come to mean having a "valid** instrument— just as with the 
electronics evaluator. 

What exists, in fact, are highly reliable instruments the validity of which 
is questionable. They do not always corjfelate highly with judgments of 
educational quality. The distortion of the intersubjectivist verification 
principle has resulted in equating objectivity with externalized, replicable 
procedures- -though these procedures may be infected by biases and 
hence be qualiu ely subje'fctive. 

The identification of objectivity with a completely specifiable external 
procedure has another important effect. It relieves the evaluator of re- 
sponsibility foi the results and consequences of the evaluation. After all, 
if these "objective** instruments and procedures give these results, how 
can the evaluator be held liable? Science is to blame. Polanyi (1958) calls 
this position "objectivism.** Objectivity in this sense comes to mean that 
observations are subject to independent verification without reference to 
the person who produced them. * ' 

Now it is not possible to specify all knowledge explicitly nor to verify it 
completely bj independent external procedures. Scriven contends that 
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even in mavfiehiatjcal proofs in which the steps of the proof are reduced 
to the self-evident*, intuition plays an inevitable and important role. Not 
only is intersubjecfiye verification not a guarantee of truth> it is not even 
necessary. Truth is an ideal which is approximated through an interplay 
of introspection and public verification. 

Because of their complexity, many intuitive judgments can never be 
fully explicated. But conclusions may be no Jess true because of one's 
inability to explicate them. Agreement aifiong many may be necessary for 
explaining the truth to someone else but it is not necessary fqr the truth 
itsejf. 

How is it possible to establish the validity of a claim if one cannot 
separate it entirely fronrthe person making the claim? One^ way is to 
check the reliability of the observer jn previous instances and to check the 
observer's freedom from bias. These are not guaranteed to produce truth 
but there are rip guarantees any\vay. There are knowledge claims that are 
hybrids of the internaL-external split, e.g., tendency statements, analogies, 
approximations, that are true yet are not the types of claims one usually 
assocfates with scientific statements, accordingjo Scriven. He crills'^them 
"weak knowledge ' claims and suggests they represent the type of knowl- 
edge available in the social sciences. 

Such knowledge claims are manifested more as explanations tha^ as 
X predictions. Explanation and understanding ate functions of the way 
information is coded in the mind. Ejcplanation implies a person who is 
understanding the explanation. It does not exist by itself. The under- 
standing is ultimately reducible to something familiar in the mind of the 
- " audience doing the understanding— or else it is not an explanation. 

Similarly, unless an evaluation provides an explanat" ^n for a particular 
audience, and enhances the understanding of that audience by the content 
and form of the arguments it presents, it is not an adequate evaluation for 
f that aud'^er ^^cn though the facts on which it is based are verifiable by 
other procedures. One indicator of the explanatory power is the degree 
to which thCv audience is persuaded. Hence an e/aluation may be ''true" 
in the conventional sense but not persuasive to a . particular audience for 
whom it does hqi serve as an explanation. In the fullest sense, then, an 
evalpation is dependent both .on the person who makes the evaluative 
v statement and on tn^ person who receives it. 

Prediction is not necessary .o demonstrate understanaing. Inferring 
* another event from a correlation coefficient plus a few antecedent con* 
ditions is not necessary as a test of validity or objectivity of an observation 
or an evaluatiun. Rubbing bare observatix)ns together to produce sparks 
^ of correlations is a forloim entetv^rise in much social inquiry. Rather, the 
basic reasoning pattern is closer to one of pattern matching, of finding 
reasonable interpretations and explanations and understandings \sHthin a 
given contexu The test of an explanation is not accuracy in predicting an 
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event but whetherthe audience can see new relatiofis and answer **new 
but relevant" questions. \ 

Finally, abo^ut the question of objectivity one must conclude one of two 
thingsr either objectivity cannot be exclusively identified ,^vith an exter- - 
nalized procedure totally separated from the minds that Produced the 
observations and comprehended them; or else a great dear>^of truth is 
subjective^ in character. In *he .first case, objectivity means Something 
more than it is commonly taken to mean; in the second case, >t means 
something less. \ 

What about validity? One deffnition of validity is that it is based on 
objective procedures. Validity carries with it the notions of bting^propcrly 
related to intent, of being correctly derived, and of being sanctioned Tby 
authoi'ify. In the narrow sense of quantitative objectivity, validity ik 
eqi ed with prediction— with checkjng the data against a criterion. But ^ 
that assumes a single intent and assumes intersubjectivism as the verifi- >^ 
cation principle. This is too narrow a proce^ ure. Ultimately, says Cxsm- ^ 
bach (1971), validity is u -tendent on how the data are to be used and 
^'utility depends upon values, not upon the statistical connections of 
scores." 

If one cannot arrive at a single score prcsuipably indicating validity, 
how is validity detern^ined? Perhaps the best answer to the question is to 
examine the sources of invalidity. An evaluation may be invalid in a . 
number of ways. One way is for the **facts and truths" upon which the 
evaluation is based to be wroisg. Facts and truihi are subject tu the agree- 
ment of the universal audience. Man> facts and truths are accepted with- 
out question b> everyone. Other data must be determined b> recognized 
data collection procedures, which are in turn sanctioned by a particular 
discipline and subject to putlic scrutiny. Often validity refers to using 
the accepted data collection procedures themsehci, as Cronbach's article 
on test validadon suggests. \ 

Another way in which valioity^ is at issue is in relating conclusions and 
interpretations to the data, /s ^ronbach asserts, it is not the tesi or the 
data collection procedures thems^elves so much as the interpretations that 
are valid or invalid. This is the validity of an inference. Is the inference 
correctly derived from the data find premises? 

There is also the qacstion of vvhether the interpretation can be property 
applied to situaiions other than the one from which it was derived, since 
ail generali/ tions are context dependent. These concerns have been dealt 
with in experimental jicMgn somewhat svstematicall) as threats to internal 
and external validity. • 

In qualitative stJ'^*''^ 1 1; mo.e difficult to provide evidence of validity— 
which is not a sign t. .t docs not exist. Demonstrating* validit> in natur- 
alistic studies usuall> :onsi&ts of vonfirming one kind of data with another 
kind. In proposing ca^e studies f science education, Stake and Easle> 
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(1976) saw personaLbiascs^ind past experience' as the mainrthreat to the 
credibility of the.ca'se studies. Tbe> proposed eJften'sive tape recordingcof 
interviews, v Xtenjive us(*'ot direct .quotations ^vhcre possible, and-report- 
ing disagreements apiorig respondents where they existed. People familia^ 
with the Jocal situation could read.the written case to judge tjie accuracy 
of.porirayal. The field workers would be keyed to **hints of inconsistency" 
for further pursuit. In instructions fo ofi-site observers doing the studies, 
Stake (1975) urged confirming the observations through replication. Con- 
texts for observations would be .documented and* elucidated. Securing the 
observations of several partic?frant6 abo^t a.pa/Ucular issue or event was 
a way of "triangulating" what aftually happened. 

Most of these threats jo validity are seen from the perspective of a 
universal audience. But there is affother way of ibokjng at validity in 
evaluation— whether the evaluation is valid for particular audiences. After 
all, validity is Slwaysxioncerrred with purpose and utility for someone If 
^he'evaluation i§ not baseckon values' to wfiich the major audiences sub- ^ 
scribe, these audiences m.y^not see it.as being "valid," i.e., relevant to 
them in tht sense of peing well-grounded, justifiable, or applicable. The 
evaluatioa.tTiay simply miss the main issues as far as particular audiences 
are concerned. At the sametinie the evaluation may be valid in the sense 
that the facts are correct and the inferences from the data cqrrectly de 
rived.^ From a particular audience's perspective, the premises may be th^ 
wrong ones. ' . 

An evaluation can also^e invalid in this secondary sense if the argument 
forms employed arc wrong. For. t^xample, in this society "means-endp" 
arguments* paiticularly cost effccti\jeness arguments, are particularly^ po 
tent. If one were^to employ an argument based on .maximizing excellence 
instead pf choo.sing the best available alternative, Jt might carry, little 
weight although being equally .triFc and valid from the perspective of the 
universal audience. ^So validity can apply to evaluation in two rather 
different ways. • T • • * . 

is also the case that the more "n.aturalistic" the evaluation, the more 
It relies upon its audiences to draw its own generalizations (external valid 
ity). For example, a case study may be interpreted jn different ways by 
each reader, since each reader aas his own universe of cases in his mind 
for comparison. The reader i:an see similarities and differences based on 
his own experience and can .draw his own interpretations. 

Conceiving the process of general i-tation in this way alters even the first 
sense in v^hich validity is used. The cvaluator is still responsible for ascer 
taining and reporting **true'* facts and statements, but part of the inter 
prctation is beyond him. Since, as Cronbcflh says,, the ultimate issue is the 
validity of the interpretation, which onk the reader knows for sure, the 
audiences must assume considerable resplinsibtlity for the validity of their 
own interpretations. The cvaluator must ultimately assume ratiphal pro 
cesses in the thinking of the audiences. 
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As Ennis (1973) noted, interjnal validity and external validity refer 
to rafher different phenomena. External validity is concerned with the^ 
generalizability of general causal statements. Interna! validjty bear^ on 
specific causal statements that do not entail generalizing to new cases. 
Generalizing always assumes that one knows Jhe relevant laws involved ic^ 
extrapolating into new realms. An internally valid study, by contrast, only 
claims causality in the part within the specific circumstances. It claims 
, no extrapolation a.i^^is h^^nce less dependent on outside assumptions. 

However, neither specific causal statements nor general causal state- 
ments follow perfectly logically from observations, even in the best ex- 
perimental designs. Some empirical assumptions are needed even in ^he 
tightest design. In addition, identifying a particular event as a cause 
inescapably involves a judgment of responsibilit> that a particular event 
and no others is responsible for.the effect, according to Ennis. This ascrip- 
tion of responsibilit> requires much background knowledge and a value 
judgment. It involves a probable assignment of praise or blatrie and 
suggests a place for intervention: 

Most evaluators would assume responsibility for specific causal stale 
ments that "x caused >'* in this study (internal validit>), although this in 
itself neccssaril> involve^ a set of assumptions. But some would refer the 
generalizabilit> of the findings to the audiences' judgments, since general 
izabilit>'is based on outside information which the audiences but- not the 
evaluator ma> have. The audiences might make some of the responsibilit> 
ascriptions based on their own background knowledge and values. Some 
evaluators, partitjlarly naturalistic ones, might argue that this would 
ultimately result in superior generalizations. 

There is yet a further related problem with objectivit>. Is it really suf- 
ficient to say that an evaluator is objective? If objectivity is, taken in the 
commonly used'sensj. of employing an externalized, specifiable procedure 
which produces replicable results, then it is certainly an in:»ufficient' cri 
terion for an evalu4tion. The adnMnistration of standardized achievement 
tests is a tccally externalized, specifiable procedure which produces repli 
cable results. At the same time such tests avq thought to be highly biased 
in many ways, particularly towards minority groups. In this sense, one has 
an objective but biased instrument. In fact one can produce an instrument 
in which the bias is in the other direction. (ToJur^her confound matters, 
if racial discrimination is the intent of such an instrument., ijne could have 
an objective, ^'aiid instrument for that purpose.) / 

Ah evaluation must he free from distortion and bias (qualitatively ob ' 
jective) and being. externalized, specifiable, and replicable d^es not suf 
ficiently address possible biases. Even qualitative objeciivity is insufficient 
for evaluation, for it carries the aura of neutrality. People being evaluated 
do not want a neutral evaluator, one who is unconcerned about the issues. 
A person on trial would not choose a judge totally removed from kis own 
social system. ^ 



46 .THE LOGIC OF EVALUATIVE ARGUMENT * ^ ^ 

Being disinterested does not give one^he right to p^ici^te in a deci 
sion that determines someone's fate to a^oilslderable degree. Knowledge 
of techniques for arriving at objective findings is inadequate. Rather, the 
evaluator must be seen as a member of or bound' to the group being 
judged, just as a defendent is judged by liis peer^. the evaluator mus| be 
seen as caring, as interested, as responsive to tfie relevant arguments. He 
must be*impartial rather than simply objective. 

The impartiality .of the evrluator must be seen as that of an actor ia 
events, one who is responsive to the appropriate arguments but in whom 
the contending forces are balanced rather than non-existent. Th^ eval 
uator must be seen as not having previously decided injavor of one posi 
tion or the other. 

The evaluator maj resort to objective criteria to resolve the issues, but 
when his own impartiality is at stake, it is not enough that he give evidence^ 
of objectivity. He .nusl give evidence of his impartiality by showing how he 
has acted contrary to his own interests in the past. 

EVALUATIVE DISCOURSE: THE GOOD LIFE 
(ALONG THE SAN ANDREAS FAULT) 

It bis been several weeks since Tbegan this paper. The great Los 
Angeles earthquake has not yet come. Beautiful day succeeds beautiful 
day, each one much like the* last, so it seems tomorrow must be like 
today, a pleasant dream extending indefinitely .(argument b} unlimited 
development). 

Each day that piisscs iyiakes the quake seem l)ess likely than before. Yet 
jf it is to occur this year, it should become more likely.M reason that the 
time 1 have remaining hqre is only a sm<JI fraction of the coming year, so 
the chances of the quakcj coming now are less than for the entire year of 
the prediction (argument by probabilit>). I reason that even if the quake 
should come, the effects wi\l not be disastrous (argument by consequences) 
' In addition, the Midwest is racked by tornadoes (argument by compari 
son). Besides^'ttould many qf the smartest men in the country, including 
the seismologists, live here if the danger were so great (argument T)y 
incompatibiUiy)? I feci reassured. My anxiety lessens. 

Meanwhile within the last few days, the New York Times Magazine 
heightens the .drama In it^ Bicentennial edition (July 4, 19':'6). As sym 
bolic of "Amerfcu at 200," it features a report on 'The Good Life (along 
the San Andreas Faulty" On the cover is a painting of a fragment of a 
freeway jutlmg out inlo the empty ocean, the remains of Los Angeles after 
the next earthquake. The arkle begins with a six paragraph.sc^nario of 
the effectsof the anticipated quake (arguments by s>mbul and illustration). 

Those who. literally live on top of the nine mile 'deep fault^ have their 
own reason's for living there. As his backyard crumbles away daily, a 
postal worker, who has thfee cars, would like to move but cannot sell his 
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house (argument b> sacrifice). A ranch manager in ho finds liL belter tn 
California than anyplace he has evcf fived explains, ;Tm «jt leaving. Is 
there any place that does.i*t have some catastrophe (argument b> com- 
pa'riscn)?" * * 

For some the. precariousncss itsel! makes Being here alt the mort pi^ 
cicus. A dropped-out JhvQStment counselor living on the fault sa>5l 
"You're living.on a crisis point. Everything you have can be taken avNay 
from you at any time." I^ore than any place, in every nav, Califonii^i is 
a challenge to the argument of Unlimited development. ^ ^ 

These are not the reasons I would give but they may be right. Each nii>n 
is free to discover^ his own reasons. Each man is free to make his own 
choices. So it must be when faced with such^uncertamty of knowing. Judg- 
nientj; cannot be based on an "rrefutahlc reality. There will be a day ^hcn 
earthquakes are much more predictable than now. Even then, ihere wHl 
remain room for choice in how to respond. In sutial decision making cer- 
tainty seems remote if not impossible. ^ 
• "Faced \yith such difficulty in arrinng at an irrclWable reality, the,- are 
thbse who try to force yimplicitv atop the complexities of life andtliereby 
cra\}icatc ambiguity. They jnsiM on prcjcn^in^ there is agreement where 
ther none.'w hether of facts or of values. OftSn ia positions of power, 
(hey impose arbitrary definitions of reality for the sake of a<^n. Yet 
reality IS stiU there. Whatever even twenty-une million Californians be- 
lieve, the^great earthquake will come eventually. 

The alternative is not necessarily a descent into rrrationality. If opinions 
cannot be indisputably based, neither must the^r be regarded as entij:cly 
arbitrary, as jSe^ng merely ''value judgments." Such a clasMfication iden- 
tified as Knowledge only that'^hich is clear, distinct, anJwnambiguaus. 
This distinction establishes a schism between objectively true theoretical 
knowledge onll : one^hand and acUon based on irratioiHiVmotives on the 
'Other. It culmipatpj^ in designating as irrational those who do not agree 
with hi\c'h perspective.^ Tiassifying people as irratuinal juMifies ignoring 
their opinions and perhaps their dignity and interests, h even Icguimatcs 
using suggestion and force on them. ; * 

The alternative is^o Jrcat alt men as rational. Betwcc.n the .conservative 
authontarianf^m of tradition and the liberal authoritarianism uf sci^Vaiim, 
between the certainty* of fanatiJsm and the irresponsibility of skepticism 
lies rat!, ^ai dclibJ'ration. One must taL; seriously the opinions of other 
people and engage them in serious discourse. This is the rc^ilm of argi^ 
mentation and the proper sphere of evaluation. 

The starting point is that groups oT people adhere to opioions with 
variable intensity and that these beliefs can b^ put to the test of serious 
discourse. Even facts and values may be so cOnsJ ^red. Rational discourse 
consists of giving reasons, although not compelling reasv^ns. In the realm 
of action,' where few things are dear' and distinct, motivation caii.]be 
rational. Practice can be reasonable. 
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The cvaluator mast engage his audiences in a dialogue in which they 
af« fcee^ to eniptu> their reasoning. This incans that the audu:nces mnit 
assume persona! responMbi!it> for their interpretation of the^oalualon 
since j^hp teaioning presented to them is neither completely convincing 
nor entireiv arl}itrar>. This means that ^he evaluator must also assume 
personal rcsponsibiiit} for his judgments since he cannot hide behind 
blind method. Both must pxercise their natural rt^ason. 
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^ Educational Product Evaluation: A Prototype Format Applied* 
Gene V. Glass 

Laboratory of Educational Research 

University of Colorado . . 

The conventions and techniques for evaluating educational products are not 
yet well established. Onl> recentl> have instructional materials and procedures 
been \iewed as products to be developed and evaluated. Although the general 
procedures appropriate for evaluating consumer products are applicable to edu- 
cational products, the unique characteristics of the cdu^.atiun context raise special 
evaluation considerations. This paper addresses a "shelf item" educational product 
that is of interest in its own ri^ht. 

I. Product Description. 

The product evaluated here is an instructional cassette recording "Evaluation 
Skills" (Tape 6B) created by Dr. Michael Scriven (Department of Philosophy, 
University of California. Berkeley) and produced b> Dr. W, James Popham (School 
of Education. Universitj of California, Lps Angeles) for the American Educational 
Research Association (1126 16th St.. N.W.; Washington, D.C.) under 5 grant 
from the U.S. Office of Education. The recording is intended primarily fur in- 
service training of educational researchers and can be purchased for S6.00 from 
AERA.' . , , 

The recording consists of a lecture on fundamental concepts of evaluation. The 
lecture is about '',500 words long (the equivalentof approximately 17 single-spaced 
pages of typescript) and runs abotit 45 minutes. T^he lOO foot tape ^.assetie can be 
pHiyed on any standard, cassette player. 

II. Goals ^aluatiop^ 
. Product 'Goals jxtc: 

, • To train education a fresearc hers and others in the fundamentals of education- 
al evakjation. The tape was commissioned **. . . to give the listener at least 
one iniportant technical skill relating to educational research. . . . Although 
primarily intended as an update device for the educational researcher who 
has completed his formal training, many professors will find the tape ideal 
for their graduate Classens/* (£'£/«ca//o//a/i?e5eflrc//er. Vol. 22, June 1971, p. 2) 
• To provide an instructional product which can be used in situations (e.g.. 
. while drjving) in which typical instructional products cgn i be used. 
• • To experiment with new instructional media. 

There can be liftle quarrel with the first goal. Evaluation skills are in short 
supply. Legislation has created a significant demand for suth skills, and a need 
for training in evaluatipn is commonly and justifiably expressed. 

Making better use of othen^ise dead time in commuting is conimcndable. The 
cassette tape is one of the fe^v instructional media well suited to turning this 
unproductive lime into something worthnhile. It^ts too soon to tell whether the 
ultimate, long range effects of encroachment upon .such private time be un- 
desira^e. Nonetheless, it must be recognized that in extending an instructional 

^ ,. 

♦Glaw. Gene V. 'T.(iu«.ali<)nat Prfxlial Lvaluatum A Prutot>pc Furmai Applied. Educattunui 
Restarchn. Januar> I9''2. Vol. U No. I. Pp. 7 IQ. |6. Cop>righl 1972. American Educaliona! 
Research Association, Washinglon. D.C 
Permission fo reprint hat been granted by AERA. ^ 
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V 

opportunit> into time '^jrmerl) not so used one ma> also be contributing to the 
destrui;tton of peoples' senses of identit> as persons apart from the roles the> pla> 
as less than full> autonomous worl^ers in huge, impersonal bureaucracies. 

The goal of experimenting with i\c^ media is commendable to the degree that 
the ch9iceof media for experimentation is made >\isel> (i.e. ..on the basis of data 
concerning costs, probable effcctiveness,.availabilit>, longevit>) and is not mere 
technological tinkering. 

III. Clarification of Point of Entry of the Evaluator: 
irreversible Decisions / 

• USOE's granf to Popham (Program Director; to produce tapes. 

• Popham's choice of topics and lecturers 

^ * Lecturer's choice of subject-matter under the topic of ^'evaluation'* 
•AERA's reproduction of initial copies of the tape 
Reversible Decisions (Enter the Evaluator) 

• AERA's vending of initial copies , 

• AERA's choice of materials (cassette tapes) * 

• AERA s plans to sell additional copies of tapes in present form 
•AERA*s y^ck of plans to publish and distribute typescript 

JV. Trdde-Offs. . ' ' , 

A series of tratie ofFs are invoked in the production and appiuatjon of this tape. 
USD E traded off to prcduce the tapd^ ' 

• One-fou'fth of a 5' da> training session for as man> as 100 researchers 

• *^he printing costs of 20,000 copies of 25 pages of prose materials for research 
training • 

• Half of one year's stipend for a doctoral level educational research trainee 

• Four alUexpensc scholarships for minority researchers to the AERA training 
session of their choice 

The Cassette Tapes Project director traded off to produce the tape 

• The production of typescript copips of the lectures 

• The productiiin of recorded synopses of several classic papers on educational 
evaluation i 

AERA continues to trade off to sell and produce th(! tape 

• A snrall amount of managerial labor * 

The Individual educational research would trade off to buy the tape 

• Purchase of any four numbers in the ACRA Curriudum Evaluation Mono 
graph Series. 

• Purchase of Witt rock Wiley's The Evaluation of Ins! ruction, or the April *?0 
issue of the Review of Educational Research* or Suchnian's Evaluative 
Research, etc. 

• Purchase of photo-copics of a half do/.en significant published papers on^ 
educational evaluation. 

The trade-off with the greatest leverage that would retain the inlent of the 

luce ani 



^ producer concerns the decision to produce ano distribute the lecture as a cassette 
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TABLE 1: 

. Cassette Recording VmufTjpescHpt Costs * . 

■ "Vf — -- ~ — I - 

Cassette Recording 

1. Production'of master copy 

a. ' cost of tape only , ,\ . ^ $6.00 

b. cosf of lecturer's services and expenses $700. 

2. Reproduction of copies 

{Cost of additional cassettes only\ no economy of scale) $6.00/copy 

* 3. Mailing costs (4th class book rate) , 50.14 

4. Operation of cassette /ecorder ' 

a. Purchase Qf recorder (price qudted on cheapest model) $25,00 

b. Rental of recorder (rates range from $2.50 to $5.00 per day) $3. 75 /day 

c. operation of recorder .' $0.00 

5, Net cost of production and distribution of 100 cassette recordings (^xc/uding 

lecturer's services) : , $614.00 

Typescript 

1. Production of master copy v 

a» typing of 40 pages, djpuble-spaced typescript $8.00- 

b. cos\ of lecturer's services and expenses $700. 

2. Reproduction of copies (cost of paper an(J photocopying no economy of scale 

above KOOO copies) $0.40/copy 

3. Mailing costs (4th class book rate) $0.14 

5. Net cost of production and distribution of 100 typescripts (excluding lecturer's 

services) $62.00 

recording rather tlian a typescript, Thus, the evaluation of the product wiil have 
a prominent comparative element in which a typescript of the lecture is the 
alternative product. 

V. .Comparative Cost Analysis. 

Table I ^ijmmarizes the comparison, additional cost considerations fojlow. 

Simultaneous Mass listening. For simultaneous teaching of 10-50 persons, 
the cassette recording could be econumically used — even though there is significant 
distortion at higher volume on the N^ilovac (CR 203) Cassette Recorder. 

Tape costs. The tape appears to u&of high quality, perhaps too high since the 
voice frequencies of the tape do not^ require high fidelity reproduction. Since the 
lecture is only 45 minutes, it could have easily been recorded on a shorter, thicker, 
less expensive tape. There are other disadvantages of the thinner, more expensive 
tape, it tends to bind or» cheaper players, print through can occur in the recording 
process. It Is presumed that nearly S5 ^vas paid for.each cassette. The evaluat n 
has priced cassettes of acceptable quality at $0.75 per 60 nfiinutes playing time 
(source. University of Colorado Bookstore). Tfie entire cost of the tape cassette and 
reproduction from a master tape can be held below $2,00 (Authority. Dept. of 
Audio-Visual Insti^ction, Univ. of Colo.); 

Storage costs, A 40-page typescript would occupy 65 in' of storage space com- 
pared to the 10 in* occupied by the cassette recording. If storage space became 
quite costly, the^cost advantage would swing to^vard the cassette recording. How- 
ever, under sucH circumstances the typescript could be transferred to microfiche, 

. . MO ^ 
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for which storage (and usage) costs would be substantially below those of the 
/cassette recording. / 
Reducing costs for the typescfipt. Prices for the typescript version of the lecture 
are quoted on a 40-page double-spaced manuscript. These costs could be signifi 
cantly reduced by the following means, a) editing redundancies from lecture could 
reduce length by 10 per cent, b) single-spai.ed typing could reduce typescript 
length by almosi half. Both a) and b) would result in a typescript version of the 
recording which could be sold for less than 20 

*> 

VI. -Intrinsic (Secondary) Evajuation. i 

Technical Quality ^ f ' 

Tape quality: Excellent. (But unnecessarily expensive.) 

Recording fidelity. Excellent. The tape is free of background noise, volume is 
c>cn. ' , 

Esthetic quality . Excellent. Lecturer's voice is well modulated, delivery is smoc*h 
and conversational. 

Editing. Poor. Numerous 6tups starts during recording {approximately a dozen) 
have garbled one- or .two ttords at the beginning of sentences, distracting and 
occasion all) confusing. Approximately 10 seconds of recording is obscured at 
al)out the 80-foot mark of side 1. 

Tape packaging. Poor. Sides (1 anfl 2) of tape are not marked. Cassette is 
difficult to remove from its pourl> designed case. Erasure preventing devices on 
cassette were not activated b> vendor. Label is not permanent and wi;s poorly 
attached oh the cassette purchased by tliQ evaluator. 

Content Evaluation. ^ 

1. Seleclion and Organization of Topics: Excellent 

2. Use of Examples: Excellent 

3. Clarity of Explanations: Excellent 

4. Identification of Lecturer. Poor. Lecturer is identified only by name on 
label. No address or institutional affiliation is given for Lecturer even though 
he solicits communications from listeners at one point. 

5. Accurac) of Scholarly Citations. Poor. Eisner volume is incorrectly cited as 
Confronting Curriculum Evaluation. Bloom, Hastings, Madaus handbook 
on formative evaluatio.i is inadequately referenced as a "volume edited by 
Bloom." Wittrock & Wil^ are cited, but authors* names are not spelled. 

Utillzatioti of Uniqueness of Medium. 

The Mpe must be rated poor on this criterion. Thq^Jecturer claims that the 
uppurtunity to stup a recording is aiimquc feature of the medium C'the tape can 
be stopped mure easil) than the eye can be stupped from glancing ahead"), how 
ever, this claim cannul be .substantiated in theupiniun of the evaluator. pnly about 
five requests for stops arc made, and these requests are not very compelling. 
Furthermure they are prubably inferior m eliciting thuught when compared with 
adjunct questions in a typescript accompanied b^ answers at the end of the text. 

The claim is alsujnadc that the tape can be played under circumstances in which 
reading is impussible ur inconvenient (e.g.. on airplanes' or in cars). The range of 
circumstances in which the cassette rucurding is more convenient is probably 
smaller than the lecturer claims. Reading t>pescript on an airplane \i quite con 
vcniently dune, furthermore, considerations of fellow travelers' comfort would 
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require an earphone^.usually a no-cost but often misplaced accessory. ^Vhether the 
cassette recording will be utilized as is hoped {primarily in automobiles when no 
productive use of time would be made) remains to be seen. The data below bear 
on this possibility. 

Survey of Availability of Cassette Player and Incidence of Extended Commuting 
Among AERA Members, ^ 

The following survey questionnaire was sent to a random sample oi 100 members 
olAERA: ' , ^ 

Dear AERA member . , 

This survey is p^rt of an evaluation of the AERA cassette tapes program. It is twt 

sanctioned by AHRA; they are not aware that it is being conducted. 
We uould ap|.reciate >our answering the following questions: 

1. Do you ha\e access to cassette tapc^ player (i.e.. do you own one or cpuld >ou 
borrow one at no cost)?' 

^rs No ' 

2. Do you commute by car to \«ork for more than 20. minutes each way? 

Yes No 

A total of 62 usable questionnaires were returned. The results permit the 
following conclusions regarding the availability of cassette players and their 
possible use while commuting to and from work: 

1. Results. Frequencies.of Response with Percents of Total Sampling. 

Accg;^s to a cassette playen 
(ycs)J'\ (no)l 

Commute ) 

. 4 more than (yes) olzt^i^"' 2( 3%) 15(24%) 

20 minutes 
caci] way 

' to work (no) 39(63%) 8(13%) 47(76%) 

Totals ' 52(84%) 10(16%) 62 

2. Conclusions. 

• That. 84%* of. AERA members have access to a cassette tape player 
indicates that AERA made a good cho!^;e of an ''alternative** instructional 
medium. 

• Even though a substantial minority (20%)* of the AERA membership 
spends sufficient time commuting by car to make the tape medium of 
instruction advantageous, in terms of a head-count a substantial number 
(about 2000) of AERA members do commute under coiiditions which 

- would permit instruction by cassette tapes. 

VII. Outcome (Primary) Evaluation. 

Learning Rate. Even if the aural medium is as effective for transmitting 
infurniatiun as the visual medium (a qucstiun addressed later), it is Onduubtcdly 
slower. The speech rate for the cassette recording in question is approximately 
^bO words, minute (slightly slower than nurnial. conversational English). This is 
less than half what the reading rate would probably be for the typical listener 
(the average college freshman reads >iewspaper prusc ai .lorc than 300 words, 
minute). 



These Mmpic cstimjtc^ at*. >ubjcs.t to ^ubstiinttat sumpltng ctrur be^ituvc of iht. ^nult sample >t,^ 
In ^ 62). The 95% confidence mtervaU on .84 and .24 arc 1.68. ,92) and (.13. .40). respect h civ. 



ERJC 6 



58 THE LOGIC OF EVALUATIVE ARGUMENT 



Tht effect on learning of this slower rate could be more serious than merel> 
^ doubling the time required to learn the content of the recording. The slower rate 
of information ^yresejpitation .m the aural modalit) ma>.tax t(ie retentuc powers of 
short termvmemor> to the-extent-that comprehension is seriousl) impaired. 

A compressed speech version of the recording might correct problems alleged]} 
asi^ociated. with this low information transmission rate. Speech rates can be more 
than doubled b> means of speech compressors without impairmg comprehension. 
However, recording equipment may be prohibitively expensive. ^ * 

Provisions for Arbitrary Access. Perhaps the principal disadvantage ^f record- 
ings as a teaching device is that access to material on a tape at arbitrary points is 
awkward. Access tvJ a particular section of a recorded lecture could be slowe^ by a 
factor of (en or more than access to the same section in a typescript. ^ 

Knowledge Acquisition in the Aural vs, the Visual Mi)de, The relative efficiency 
of learning through visual and aural modalities has been debated in the history of^ 
psychology at least since 1894. As with most comparative educational research, 
the findings have been largely inconsistent and non generalizuble. Relative effi 
ciency appears to depend on such interactive factors as 1) meaniugfulness of the 
instructional material. 2) age of learner. 3) reading speed of learner* 4} intelligence 
of learnsr. 5) difficulty of the instructional material, ana 6) whether retention is 
measured immediately or delayed^ (["or an excellent rjCriew of published studies on 
this question, see Travers. R.M.W. e/ a/. Research and Theory RelaSed to Audio 
Visual Injormation Transmission, USOE Contract No. 3-20-003. 1967). 

A recent experjrppnt relevant ,to the comparative effectiveness of the cassette 
recording and typcsaipt learnings as performed by James R. Sanders ^Shurt term 
and Long term Reififii(i^n Efjecf£^uf/\djunct Questions in Aural Discourse, Ph.D. 
thesis. Lab. of Eii ^pcav^iyun^^^ of Colo., 1970). Sanders piesented a 2000- 
word biography of William Janie,s to 72 undergraduates in either the visual or aural 
nfipdc. Learnirig was meii^uml tj^icdiately after presentation and one week later 
with a multiple choice t^itvj^utts showed significantly (ps.05) greater learning 
in' the visual mode fSandjSi^, 1970. p. 70). 

VJIl. Suftxmativ^udgments and Recommendations. 
Judgments:'^ , 

T|ie technic^) qu«^^i(> <>f the recording, is good. The substantive content of the 
lecture is excellent. Thctc\.orilmg is substantively moruexpensive than a typescript 
version of the same lecture and is probably less effective as a teaching device. 

Recommendations , . 

Tu ihe Individual rt^farchtr seektng tu upgrudi his understanding uf txaluation. 
Do hot purchase tKis recording. Instead, buy AERA Curnutlum Evaluation 
Monograph No. 1 and Suchman's £jt///<u//»'e Research or purchase photocopies 
of the following papers: 

Cronbach. L. J. "Evaluation for course improvement/* Teachers College Rec- 
ord. 1963. 

.Scriven, M. "The methodology of evaluation.** AERA Curriculum Evaluation 
Monograph, No. 1, Chicago: Rand-McNally. 1967. 

Stake. R. E. 'The countenance of educational evaluation." Teachers College 
Record, 1967. 

If AERA uffers for sale a typescript version of recording 6B (sec Recommen- 
'ations below ), purchase it at any price up t^ S 1 .00 but not in place of purchasing 
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photo-copies of anv of the above three papers or AERA Curriculum Evaluation 
Monograph No. 1 (Rand-M,cNalfy, 1,967). 

ToUSOE: . ' ' 

Cease allocating funds to the production of instructional recordings unless a 
compelling argument is, presented that the instruction cannot be conducted in the 
visual mode (e.g., some instruction in music; training in auditory discrimination 
for young children, some instruction in speech pathoIog>, linguistics* foreign 
language; "talking books" for the blind.) 

Funds for trainingvexpended on development of products like that evaluated 
here would be better spent in support of the ^ERA Research Training Sessions 
progran* or in commisitiuning. reproducing, arid disseminating Iraiping materials 
ID typescript form. 

To AERA: 

Offer for sale at 75^ per copj (to include mailing and handling) a typescript of 
the contents of Recording 6B. Offei's for sale of the recording and the typescript 
should not be made separately. ' 

Produce the cassette on cheaper tapes for the purchase whose circumstances 
make* it an effective, superior learnmg devic^. ^ 

IX. Circumstances Modifying the Summative Judgments (Scope of the Value 
Claims). 

The conclusion that the cost, effectivenei.s of the t>pescript version of the lecture 
is greater thah the cost, effectiveness of the cassette recording would not be. ex 
pectcd to hold (the superiont> would be reversed) for sightless learners (whc arc 
also not deaf). 

The cassette recording ma> be effective and js probably less expensive than the 
distribution of the typescript version for large groups (e.g. -.i undergraduate 
class) for which simultaneous mass listening is possible. 

The cassette recording nia> be the onI> wa> to reach a segment of the population^ 
v^ho might be characte-ized as ''Reverse-Luddites" or "Mechanical Cultists,'* i^e., 
those persons who purchase el(^ctric carding knives, can openers, trail, bikes, 
complex stereo s>stems. etc., and who claim — with vague appeals to McLuhan — 
that sfncc book.^ are passe they are no longer read. r. 

X. Evaluating the Evaluator. 

Whytun Evaluation? 

Gratuitous evaluation of products for .\hich the net social Investments are small 
can be' a hostile act. Such "evaluations" can incur greater ultimate social costs 
than the> reduce t> desitu>ing a sense of community among producers and 
cvaluaturs, .b> creating defen^iveness among producers who then refuse to co- 
operate with evaluators, b> eroding civil it v in human relations, etc. But in this 
case, ifiv product developer asked to have his product evaluated. ^ 

The Evaluator s/Motives. 

Evdluator*s motives which would be served b> a /avorabie over all judgment, 
a) He is a member of the AERA Executive Board and would take satisfaction in 
the success of an^ AERA sponsored activit>. b) Persons involved in the productum 
of the recording are coHcagues of hjs and in a position indirectly to promote his 
general welfare. , 
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* • 

Evatuator's niutivcs which vsould be served an unf^nvrable overall judgment, 
a) He ilcchned an invitation to participate in the recording prf)gram on the grounds 
that it did nof make use of the unique features of the medium, and would not be 
Cost/ effective compared to dissemination *of the presentations in written form. 
Hence, an unfavorable judgment v\ou!djconfirm his prejudgment and protect bim. 
against feeling that an opportunity had been lost, b) He was once beaten jn a 
table-tennis match by the project director,. 

The eval»ator has collfccted no representative data— cither objective, or sub 
jective — on artitudes toward the product or its effectiveness as a learnuig de\ice. 
His claim for the superiorit) of the typescript version of the lecture as a teaching, 
device IS based on extrapolation of the findings of a half dozen experiments in 
audio-visual research comparing learging in the qural and visual modes. 
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Educational Product Re-Evaluatlon'^ 

Michael Scriven 
University of California 
Berkeley 

1. Background 

fa). The editor of*ER invited, this response tc Glass' (Jan. 72) evaluation of 
the cassette I did for the^ AERA series concurrently with the acceptance of the 
Glass manuscript f^r publication. ^ 

(b) . As Glass notes, I explicitly invited evaluations of the cassette and in fact' 
Offered a prince^ t^rize for the winning entry^ namely, $8.00-r-the cost of the; 
cassette. Glass* entry currently holds the lead :n the competition' for this prize, for 
the unimpeachably objective reason that it is the only one. 

(c) . It is of^some' interest that the production procedures of these cassettes 
involved one step of formative evaluation. Popham brought the authors to Los 
Angeles, where they uttered their talk into a microphone in a recording studu), 
without audience. The talk.waj^recorded and also piped into a nearby room, where 
a number of experts and Stadents heard it, .and later critiqued it in a discussion 
with the author. Jn the light of this critique, the author then rewrought the talk 
at his leisure and taped it on a small portable which he was lent. The isolation '*f 
the* recording act at Los Angeles was intended to simulate this final production 
proco.dure and to provide a chance to pick up technical deficiencies iri recording 
^procedure. 

(d) . While the formative 'jvaluation of'my performance was quite favorable, I 
decided to redo it completely. This involved some risks, not ^\\x)^ which paid off. 
It is« for exainple, possible that the new attempt was worse than the original one, 
and somewhat more^pro])able .that it was worse than a touched up version pf the 
original would have been. A second cvcle of feedback would have'been ideal, but 
was impractical; Three procedures are possible to handle such ytuations, (i) mini 
max strategy would support prohibition of "new starts"; fii) funds might be 
budgeted to provide*a second cycle'in, say, 5% or 10% of the cases (Ij^inl^^I was 
the only such case), (hi) the producer might - as he did - take the phanfee that the 
author can improve his rating by making a fresh start. It would be interesting to 
have some data on these strategies. 

2. Self-EvalOations - * " 

I have critjcized the authors of the Phi. Delta Kappa report on evaluation for a 
section in which they attempted a quasi formal evaluation qf their own book, in the 
book. 1 argued that if they came up with anything negative," they should revUe the 
book, a;)d if they didn't tl\e endorsement was superfluous. That argument is over 
simplified but still seems plausible. Now rhiewing one's own book, as I once did 
by invitation is onljf one stage better.^ut there ts a time lag and a chance for n^^ 
critical input from others. Replying to reviews, as here, is then two stages from 
self-endorsement. The probability of bias has scarcely evaf^rated. but its probable 
direction is so obvious that it can do less harm than -fhen conceale<f and there is a 
chanpe o\ useful rebuttals. The most mteresting problem for^he author in this role 
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is distinguishing between givingj£Xcu>es'und rcbuttiof* criticisms. Fui example, one 
Might excuse technical defects tn the tape as uue to ei,uipment deficiencies, but 
this hardly atfects the oaluatur^Jt complaint about them. Explanations may ha»c 
some \alue for future projects of the same kind, but in attempting to achieve the 
best possible ^*!/iiniative evaluation of the product, they are irielevant. Since one's 
|)roduc(s arQ .seen by others, as well as oneself, as extensions of oneself, it*s ver> 
hard to avoid thmking of excuses as relevant. But I shall try to<eschew them, and 
, focus on sunimative 'product evcviuation, as docs Glass. Defense is therefore rc 
served against the usual mferenccs from th^ several defiiicncics of product to th^sc 
of the author or producer.^ 

3. Specific Reactions * - . * 

(a). Technkal'^Hay(iware, 

Many of the eofnments made here seem cx^rrect, but one or two caveats should 
be considered. The value of good quality tape and cassettes is not immediately 
detectable. Both print through and deterioration of signal to noi^e ratio under 
heating and magnetuing cycles increav^ with the year^, and the mechanical com 
pcne;1ts of v, cassette are extremely susccpfible to wear. The under $2.00 figure 
quoted by AVI at L. of Colorado indicates severely substandard materials. (Never . 
theless, some *saj^ing might ha>e been made here and it certain l> Mould have be^en 
preferable if more of thu tape could have been recorded, assuming any merit in 
the marginal material thereby added.) To^he extent that teaching use is made of 
the tape, exposure to Morn and over magnetized heads and defective tape trans 
purts^ommon faults in classroom players— ^ould increase the desirability of 
better quality ^materials. Ttic distortion noted by Glass was due to the amplifier 
and speaker limitations of his pjayer. Using an Advent 201 play tog through 
Macintosh electronics and AR transducers the results would, 1 judged, be quite 
satisfactory in a 2500 seat auditorium — even with an audience present. Of present 
portable players, Craig jind Son> arc pretty good products in the economy sector. 
JbK Technical-^So/tware « 

(Again, complaints not coa^sted are conceded.) It seems likely tha^t the defi 
viencies in citations would not significantly handicap u^oul library :>earch routtr.c:^. 
Feedback to, me has proved possible for those vvho wrote co AERA or c.o Pop 
'ham, not an excessively taxing procedure. ' , » 

fcjf Crucial^ompamons . 

Thv general procedure of really working to get estimates of comparative cost 
effectiveness seems to me absolutely correct and indeed the method of choice in all 
educational evaluation. But the interpretation uf the result^ by Glass may not be 
unimpeachable^* Let rtie try shaking the kaleidoscope of data up a little to sec how 
much stability there is m tlic image he reports. Consider the cassettes as sening 
suicly these purposes, iu Providing an improvement over listening to the -.ar radio, 
or music tapes, for drivers interested tn educational research. (The use by those 
passengers in car^k and on planes who find that reading ir. such circumstances 
gives them a headache is another exclusive' but small market use, which Glass 
docs not^ identify.) (tt) Providing a cheap surrogate for a visiting lecturer in (ap 
proximately) graduate courses. 

Nu;^, it .simply isn't interesting to compare cassettes with written materials 
%is-a %is use (th No doubt we ctjuld all learn more than we Jo at home an^ ip the 
office, but the AERA hasn t Jisc^vcrgd a motivator >et. and until somvone does. 
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t >ccm$ a 4i!»cful ikervii(.t; lu uffci an uptiunal cducaliunal fiflmg fur the fntcr>iikt:\ 
uf uur life &pa^.t:. On * -jj^i and cvntcnl there cun be ^uHK «;\iuui criUciims. bul ihc 
medium rcall> has nu uimpctilion when used as dcs^nbud. {Hole (hat «nc ujst 
the some as that for commercial management trainifig cassettes). - . 

Usc^iiV is usually a luxury use. of ct}ursc. Written matcnal has ^ust. spi:ed. and 
repla) ad>antagcs over tape. But a llocs nut brin^ a ncH person intu the didawtu 
leaching pruccss in i|uitc the v le Ha>. Even getting tu hear cducatiua J research 
pirsonnel has sume value m it^i^lf for the graduate student, as ^itncsi the rea^uns 
given fur attendi6gconventtun prugram^. There is alsuu pus^ibilit> tnat the txpaet 
./ several speal^crs t^ill be 5trungcr, rhutualionalh. than unr in^lructui plu^'read 

/ngs Again, an ins«iaictut in a particular classroom 5ttuatiun may (ix\ the im 
mediate importance of kicking in a diversion, a cht-ngc of pact;, an external 

> authoritv. Without arguing idi the gcne^l superiority of tape teaching, one can' 
argue fur its utility as a jrepcrtoirc.£nlarging dc\ue fux special situations unt.l 
its cuntributiun can be shovvn lu be J/hwi^ negligible, h this an adequate justifi 
catiun fur the u&e of the funds mvUved? That's a point uf enir> problem, i* ma> be 
that the funds and o^tnihusiasm ^cre not available foi anv c*thcr production. 
E%en if ihc> t*ere. the e.\pcrimcniai comrr^itmeni of ALRA should justify trying: a*» 
number of inno^u^uns like th(^ .mc. The pre>iou^ success*' of this in the medical 
m scnic^ traming aica. and ihe management area, makes it a reasunubkaxpiiu 
mcnt. nut a ^ild .»nc. pf course, "succtrss" in o*her fields has been subjecthcU 
and ccMmimKaUv determined, not by proven learning gams. But if real (cstsare t«t 
be dune, the &trateg> of doing it with AER>\ tapes and niembershtpBas a gooddeJf! 
tu recummend it over d^mg tt of) medii.al fopes. for evimplc« and trying to guess 
^n exttapolatton tu educational researchers. Su my^i<ic pal criticism of (he Gtass 
evaluation concerns the choice of the main cruvial ^.omparison, U should not have 
t jen the typescript, but just the better content Ahea|>er package cassette. Bn^adh. 
evalgation should take care not to saddle the pnKluct vviih too large a target 
population." one of the fallacies of "value dilution." 
fJh Ust ftf Medium 

"How could the tuntent have been impruved ' There ure nwn> ^sAtd ans^er^ to 
that, and Glass picks up several. I am nut persuaded b> his «.asc for a poor i«i.*ng 
on "use of the medium" 1 ^wever* Tu some extent. are jusl trading hunches on 
this. I think It's harden tu stop ^uur eye skimming 4ihead un written material rhan 
it IS to stop a tape, he does nuU He thinks that my requests (o stup are too fev^ and 
not «cry compelling, etc. But I am not persuaded here, mair;'/ t»ecause he dt^ 
nut suggest what ho»/(/ be good utllt^atlun of the med lu.ii. Tu*K, my perspective, 
the most important factor is cumprchcnsibdity at i.jtcnm^ speed. ,whfch Glass 
grants me under another heading. The interrogation idea tvas the unly diMinctive 
one 1 cxiuld think up, I expect there are others, but Vm nut con«mctd that a pout^' 
rating on this dimension is justifK/d until I see them. 

4. Wider Horizons 

fah I was so impressed by Glass* Willingness .a do field rc^*arch in- the ctjury of 
his evaluation^ that I felt toy response should aW* be h^scd un i firm empirical 
foundation. Extensive field trials on a naive .graduate student piipulation lias 
strongly ctmHrmed my own belii^ in the existence uf uiher fn/puiatiuns bosides 
those ideotificd by Glasv, or discussed abuve, for whuh thiSca^selie.ma; be useful. 
' Glass affir,ms utility for; g» ^ . 
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W "sightless learners (who are not also de^^f)**; 
(iQ **large groups l[c.g. an undergraduate class) for which simuitanepus ipass 

listening is possible"; , ^ " 

(iii} " 'Rcyctse-Luddites* or •McchanicaJ Cultists*. i.e., those persons wfio pur 

chase electric carving knives . . .trail bikes • . - etc., and who claim— 

with vague appeals to McLuhan— that since books arc passe they are no 

longer read/* 

M> surv.cy indicates that group (iii) is too narrowly conceived. There reall> are^ 
normal people who prefer listening to discussions on the radio, over reading the^ 
transcript. And the cassette is confroUable in a way the r^.dio is not — no need for < 
sudden dashes during vcmmerctals, for example. In the individual's diurnal prime, 
abbut ll a.ln., 2 p;m., and 7.30 p-m., reading work^ pretty Well. But at the cyclic 
low points of the day, (7:45 a.m., 4 p.i!i.„ar^il:45,,p.in-), there ^ts a switch in 
optimal modality, a characteristic pattern oj^ lying back with^.tiie eyes closed 
emerges, at which limes auditory input is quite acceptable. Further details of this 
study must await replication, whi^h I ^hall attend with confidence thaiKime of the 
mmor deficiencies m what is, after all, a pilot study (n - 1) are ;nore than compen 
sated by the quality of the naive graduate student population (m>^ spouse). 

(bj, ^Evaluation of educationaf products frequently, but undentandab^, over 
l(x^*the jeaming pa> off for the interniediary population, us.iatS^ Jhe te«,£her. In 
this wase the prodt^p:;Kjprobabiy) and lecturer (certainly) have learnt a great deal 
from producing thisl^ettc. This is a small group. biTt one^itli pbtentiajity for 
significant further effect on the theory and practice of oaluatfon. A substantial 
part of this learning has come from the evaluation of the cassette by Glass. . 

• ■ './ ^ 
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